Securing The AI/LLM Supply Chain

PUBLISHED:

September 18, 2025

BY:

Debarshi Das

Ideal for

AI Engineer

Security Leaders

Security Champion

Securing The AI/LLM Supply Chain

Modern AI is not one product. It is a daisy chain of artifacts you do not fully control. Attackers know this. Finance knows this when the bill spikes at 3 a.m. Compliance knows this when Private AI syncs logs to a US region. Treat the AI/LLM stack like any other high-risk supply chain. Then make it boring.

TL;DR

Require provenance for everything. Models, containers, drivers, runners, weights. No provenance, no prod.
Mandate signed artifacts and verify at deploy time. Containers and weights must be signed and pinned.
Inventory models like software. SBOMs for runners and SBOM-like metadata for models are table stakes.
Isolate inference. Multi-tenant by default is a lie. Assume noisy neighbors and shared-memory accidents.
Align to a risk framework and a control library. Use NIST AI RMF and OWASP LLM Top 10 to keep the board honest.

You don’t know what’s in your model
Containers pulled from “:latest”
Runner roulette: Triton, vLLM, TGI without guardrails
Unsigned, unpinned weights
No SBOMs for AI
GPU and driver drift
Shadow data flows and “private” that isn’t
Secrets and tokens in inference configs
Telemetry that blinds you
Minimal viable governance

The LLM stack At A Glance

If any box is unverifiable, your “AI” is just shadow IT with GPUs.

1. You Don’t Know What’s In Your Model

Problem
Teams ship “Ferret-something-13B-instruct” with a README and vibes. No documented training data lineage, fine-tuning recipe, or license constraints. That is not defensible in front of an audit or in incident response.

Fix

Demand a model card with data sources, intended use, evals, and known limits. Make it a gate.
Require provenance attestation for model artifacts. Track who built it, when, with what inputs. Adopt SLSA-style provenance.

2. Containers Pulled From :latest

Problem
docker pull org/runner:latest in CI is how you import tomorrow’s zero-day. If you wouldn’t ship a bank core on :latest, don’t ship inference on it either.

Fix

Pin digests, not tags.
Sign and verify images with Cosign. Enforce verification at admission.
Keep SBOMs alongside images and scan them continuously. SLSA gives you the scaffolding.

3. Runner Roulette: Triton, vLLM, TGI Without Guardrails

Problem
Runners are complex, fast-moving, and security-sensitive. Defaults can expose internal ports, shared memory, or debugging endpoints. Python backends, RDMA paths, and tensor caches become attack surfaces in a hurry.

Fix

Maintain a runner matrix with explicit hardening: network exposure, auth, TLS, shared memory, and backends enabled.
Subscribe to runner security advisories. Treat them like kernel advisories. Patch timelines in hours, not weeks.
Prefer isolated VPCs or service meshes for intra-runner traffic. No public ingress without auth.
Performance flags are not security controls.

4. Unsigned, Unpinned Weights

Problem
Weights are executable data. Treat them like code. Pulling arbitrary safetensors from the internet without checks is supply-chain roulette.

Fix

Require checksums and signatures for weight files. Verify before first load and at startup.
Store weights in a private registry or artifact store with access control and version pinning.
Attach and review the model card and license with the artifact.

5. No MLBOMs

Problem

You SBOM the app but not the model. Without a Machine Learning Bill of Materials (MLBOM) you cannot answer what changed between last week’s good and today’s bad. You lack lineage, license state, and exact artifact fingerprints for the model and its transforms.

What an MLBOM must capture:

Base model family, checkpoint name, and exact hash of weight files
Tokenizer version and vocab hash
Core architecture params and precision choices
Training and fine-tuning datasets as versioned references with license and PII handling notes
Training recipe: seed, epochs, batch size, LR schedule, augmentations
Post-training transforms: distillation, quantization method, kernel choices, and their hashes
Adapter metadata: LoRA IDs, ranks, alphas, merge commits
Safety and guardrail configs: system prompts, policy packs, classifiers
Eval suites and scores with dataset versions
Serving profile: runner name and version, tensor parallelism, cache settings
Critical runtime deps: CUDA, cuDNN, framework versions
Ownership, provenance, and export-control status

Fix

Make MLBOM a gate. No MLBOM, no prod.
Generate MLBOMs in the training and packaging pipelines. Update on every transform. Never hand-edit.
Store MLBOMs alongside weights in the artifact registry. Version and keep immutable.
Sign MLBOMs and weights. Verify signatures and model hashes at deploy through admission policy.
Emit MLBOM ID, model hash, and runner digest in response headers or logs for traceability. Alert on drift.
Join MLBOM with container SBOM so incident response can walk the chain from request to runner to weights to data.

6. GPU and Driver Drift

Problem
CUDA, drivers, and runners form a compatibility triangle. Unplanned upgrades cause undefined behavior and open CVEs.

Fix

Freeze a tested driver-CUDA-runner matrix and enforce via admission policy.
Track CVEs for CUDA, cuDNN, and the runner images. Map them back to SBOM.
Blue-green upgrades only. Validate perf and correctness before traffic shift.

7. Shadow Data flows and Private That Isn’t

Problem
Telemetry sinks, inference logs, and third-party connectors quietly move prompts and outputs out of region. Compliance will call this material risk.

Fix

Produce a data-flow diagram per model with explicit destinations and retention. Block egress by default.
Disable verbose request logging on runners unless isolated and redacted.
Contractually lock residency and backup locations with vendors.

8. Secrets and Tokens In Inference Configs

Problem
Bearer tokens and API keys live in YAML, env files, and Helm charts. Runners start with read-write storage tokens they do not need.

Fix

Mount short-lived credentials from KMS or secret manager at runtime. Rotate automatically.
Apply least-privilege IAM scoped to read-only weights and read-only logs.
Ban secrets in images with pre-push scans.

9. Blinded By The Lights Telemetry

Problem
You see QPS and latency. You do not see model provenance at runtime, token source, or cross-region egress. You cannot answer “what model served this answer and with which weights.”

Fix

Add runtime attestations: emit model hash, container digest, and runner version per request.
Join metrics to cost. Alert on cost-per-token and egress per tenant.
Add security signals: failed signature verification, unsigned artifact load attempts, and runner capability changes.

10. Minimal Viable Governance

Problem
AI councils produce PDFs. Attackers produce shells. Map frameworks to controls and ship.

Fix

Adopt NIST AI RMF as the risk spine and map to concrete controls in your platform backlog.
Use OWASP LLM Top 10 as the threat checklist for application and data risks around the runner.
Borrow SLSA for provenance and integrity across the pipeline. Target meaningful levels.

What Good Looks Like In Practice

Every artifact signed. Images with Cosign. Weights with a vetted signing flow. Verification enforced by the admission controller.
Provenance everywhere. SLSA-style attestations emitted by CI for images and by training pipelines for models. Store alongside the artifact.
Hardened runners. Patched within hours, private networks, TLS on node-to-node, Python backends disabled unless required, shared memory scoped and monitored. Treat runner issues like kernel CVEs.
Inventory and SBOMs. One registry of truth. You can answer “what model, what runner, what hash” for any request in seconds.
Data discipline. No log egress by default. Residency guarantees in contracts and in code.

Quick Wins This Quarter

Block :latest. Require digests and Cosign verification at deploy.
Ban unsigned weights. Centralize model storage with pinned versions and checksums.
Patch runners on a 24-hour SLA and subscribe to their advisories.
Ship a minimal model card template and make it a gate.
Emit runtime attestations: model hash, container digest, runner version on every response.

Final Thought

AI deployment is supply-chain security with bigger bills. Make every component verifiable, patchable, and auditable. If you cannot prove where a model came from, what it runs on, and what it talks to, you are not operating AI. You are operating luck.

For teams ready to move beyond luck and implement real-world, auditable controls across the LLM supply chain, AppSecEngineer offers hands-on labs, expert-led courses, and bootcamps covering model provenance, artifact signing, SBOM/MLBOM generation, and LLM-specific risk management. Accelerate the journey from theory to practice and make AI security a repeatable and measurable discipline with AppSecEngineer as a training partner.

Debarshi Das

Blog Author

Debarshi is a Security Engineer and Vulnerability Researcher who focuses on breaking and securing complex systems at scale. He has hands-on experience taming SAST, DAST, and supply chain security tooling in chaotic, enterprise codebases. His work involves everything from source-to-sink triage in legacy C++ to fuzzing, reverse engineering, and building agentic pipelines for automated security testing.He’s delivered online trainings for engineers and security teams, focusing on secure code review, vulnerability analysis, and real-world exploit mechanics. If it compiles, runs in production, or looks like a bug bounty target, chances are he’s analyzed it, broken it, or is currently threat modeling it.