On-prem inference reduces the exposure of prompt and response data — but only if the model you are running is the model you think you are running. Model weights download from Hugging Face, container images pull from a registry, tokenizers and safety classifiers install via pip. Every one of those is a supply-chain link, and any compromised link is a backdoor that sits inside the most trusted part of the system. The defenses below mirror what is already standard for software supply chains (SBOM, signing, provenance), adapted for ML artifacts.
.pt
files. The safetensors format exists specifically to eliminate this
class of attack.A Software Bill of Materials lists every component in a deployment with version and hash. Extend the traditional SBOM (SPDX or CycloneDX) to cover ML-specific items:
Generate the SBOM at build time, sign it alongside the image, and verify it during deployment. CycloneDX has a dedicated ML profile for exactly this.
A model card captures intended use, training data summary, evaluation results, and known limitations. For supply-chain purposes, treat it as a first-class signed artifact:
model.safetensors.sig). Verify on load; refuse to initialize if
the signature fails or is missing..pt / .pkl
/ .bin checkpoints as untrusted code unless they come from a signed
pipeline you control.# Verify a container image is signed by the expected builder identity.
cosign verify ghcr.io/acme/inference:v1.7.0 \
--certificate-identity-regexp="https://github.com/acme/.+/release.yml@.*" \
--certificate-oidc-issuer="https://token.actions.githubusercontent.com"
# Verify a model-weight blob's detached signature.
cosign verify-blob \
--certificate model.safetensors.cert \
--signature model.safetensors.sig \
--certificate-identity "release-bot@acme.com" \
--certificate-oidc-issuer "https://accounts.google.com" \
model.safetensors
# Verify provenance attestation (SLSA).
cosign verify-attestation --type slsaprovenance \
--certificate-identity-regexp=".*" \
ghcr.io/acme/inference:v1.7.0 | jq '.predicate.buildDefinition'
Runtime hook that refuses to start inference if weights fail verification:
import hashlib
import subprocess
import sys
def expected_sha256(model_name: str) -> str:
# Pinned at build time; shipped inside the container, signed alongside it.
return {
"llama-3.1-8b-legal": "ab12cd34...fe",
}[model_name]
def verify_weights(path: str, model_name: str) -> None:
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(1 << 20), b""):
h.update(chunk)
if h.hexdigest() != expected_sha256(model_name):
sys.exit("weights hash mismatch; refusing to start")
# Independent cosign verification of detached signature.
r = subprocess.run(
["cosign", "verify-blob", "--certificate", f"{path}.cert",
"--signature", f"{path}.sig",
"--certificate-identity", "release-bot@acme.com",
"--certificate-oidc-issuer", "https://accounts.google.com", path],
capture_output=True,
)
if r.returncode != 0:
sys.exit(f"cosign verification failed: {r.stderr.decode()}")