Supply-Chain Security for Model Artifacts

On-prem inference reduces the exposure of prompt and response data — but only if the model you are running is the model you think you are running. Model weights download from Hugging Face, container images pull from a registry, tokenizers and safety classifiers install via pip. Every one of those is a supply-chain link, and any compromised link is a backdoor that sits inside the most trusted part of the system. The defenses below mirror what is already standard for software supply chains (SBOM, signing, provenance), adapted for ML artifacts.

1. Threat Model for ML Artifacts

Weight poisoning — published weights modified to emit a hidden behavior triggered by a specific input (backdoor attacks).
Deserialization RCE — pickle-formatted checkpoints can execute arbitrary code on load. Historically common in PyTorch .pt files. The safetensors format exists specifically to eliminate this class of attack.
Tokenizer / preprocessor substitution — a tokenizer file with a modified vocabulary can subtly change model outputs in ways that bypass safety classifiers.
Container image replacement — the inference server image is just another container; typosquats, dependency confusion, and hijacked registry tags all apply.
Fine-tuning data poisoning — when training on external corpora, hostile records can plant triggers that survive into the weights.

2. SBOMs for Model Deployments

A Software Bill of Materials lists every component in a deployment with version and hash. Extend the traditional SBOM (SPDX or CycloneDX) to cover ML-specific items:

Base container image (digest, not tag).
Python and system packages with pinned versions and hashes.
Model weights: name, revision, SHA-256, source URL, license.
Tokenizer and any auxiliary classifiers (safety, moderation).
Training data provenance where applicable (dataset version, DP parameters).

Generate the SBOM at build time, sign it alongside the image, and verify it during deployment. CycloneDX has a dedicated ML profile for exactly this.

3. Model Cards & Provenance

A model card captures intended use, training data summary, evaluation results, and known limitations. For supply-chain purposes, treat it as a first-class signed artifact:

Attach to the weights via SLSA provenance (level 3 minimum for production).
Include the build environment, source commit, and the signature of the human or pipeline that released the model.
On the deployment side, refuse to load weights whose provenance cannot be verified against an allowlist of trusted builders.

4. Signing & Verification

Container images: cosign with keyless / OIDC identity or with a KMS-backed key. Verify on pull; reject unsigned images at the cluster admission controller.
Weights: publish alongside a detached signature (model.safetensors.sig). Verify on load; refuse to initialize if the signature fails or is missing.
Safetensors over pickle: treat .pt / .pkl / .bin checkpoints as untrusted code unless they come from a signed pipeline you control.
Transparency logs: publish signatures to Sigstore Rekor so a third party can audit that a given artifact was signed by the claimed identity at the claimed time.

5. Example: Cosign / Sigstore Verification

# Verify a container image is signed by the expected builder identity.
cosign verify ghcr.io/acme/inference:v1.7.0 \
  --certificate-identity-regexp="https://github.com/acme/.+/release.yml@.*" \
  --certificate-oidc-issuer="https://token.actions.githubusercontent.com"

# Verify a model-weight blob's detached signature.
cosign verify-blob \
  --certificate model.safetensors.cert \
  --signature   model.safetensors.sig  \
  --certificate-identity "release-bot@acme.com" \
  --certificate-oidc-issuer "https://accounts.google.com" \
  model.safetensors

# Verify provenance attestation (SLSA).
cosign verify-attestation --type slsaprovenance \
  --certificate-identity-regexp=".*" \
  ghcr.io/acme/inference:v1.7.0 | jq '.predicate.buildDefinition'

Runtime hook that refuses to start inference if weights fail verification:

import hashlib
import subprocess
import sys


def expected_sha256(model_name: str) -> str:
    # Pinned at build time; shipped inside the container, signed alongside it.
    return {
        "llama-3.1-8b-legal": "ab12cd34...fe",
    }[model_name]


def verify_weights(path: str, model_name: str) -> None:
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1 << 20), b""):
            h.update(chunk)
    if h.hexdigest() != expected_sha256(model_name):
        sys.exit("weights hash mismatch; refusing to start")

    # Independent cosign verification of detached signature.
    r = subprocess.run(
        ["cosign", "verify-blob", "--certificate", f"{path}.cert",
         "--signature", f"{path}.sig",
         "--certificate-identity", "release-bot@acme.com",
         "--certificate-oidc-issuer", "https://accounts.google.com", path],
        capture_output=True,
    )
    if r.returncode != 0:
        sys.exit(f"cosign verification failed: {r.stderr.decode()}")

6. Runtime Enforcement

Admission control — cluster policy (Kyverno, Gatekeeper) blocks unsigned or unattested images from scheduling.
Read-only filesystem for weights — mount the model directory read-only; any write attempt is a tamper signal worth paging on.
Integrity monitoring — compute weight hashes periodically and alert on drift.
Egress restrictions — the inference pod has no need to make outbound connections to the public internet at runtime; enforce with network policy so a compromised model cannot exfiltrate prompts.
Fail closed — when verification or provenance checks cannot complete (upstream outage, cert expired), refuse to start rather than fall back to an unverified path.