No zero-days for your skills: 25% off all bootcamps now | Coupon code: SECURITYFIRST25

Securing The AI/LLM Supply Chain

PUBLISHED:
September 18, 2025
|
BY:
Debarshi Das
Ideal for
AI Engineer
Security Leaders
Security Champion

Securing The AI/LLM Supply Chain

Modern AI is not one product. It is a daisy chain of artifacts you do not fully control. Attackers know this. Finance knows this when the bill spikes at 3 a.m. Compliance knows this when Private AI syncs logs to a US region. Treat the AI/LLM stack like any other high-risk supply chain. Then make it boring.

TL;DR

  • Require provenance for everything. Models, containers, drivers, runners, weights. No provenance, no prod.
  • Mandate signed artifacts and verify at deploy time. Containers and weights must be signed and pinned.
  • Inventory models like software. SBOMs for runners and SBOM-like metadata for models are table stakes.
  • Isolate inference. Multi-tenant by default is a lie. Assume noisy neighbors and shared-memory accidents.
  • Align to a risk framework and a control library. Use NIST AI RMF and OWASP LLM Top 10 to keep the board honest.

Table of Contents

  1. You don’t know what’s in your model
  2. Containers pulled from “:latest”
  3. Runner roulette: Triton, vLLM, TGI without guardrails
  4. Unsigned, unpinned weights
  5. No SBOMs for AI
  6. GPU and driver drift
  7. Shadow data flows and “private” that isn’t
  8. Secrets and tokens in inference configs
  9. Telemetry that blinds you
  10. Minimal viable governance

The LLM stack At A Glance

If any box is unverifiable, your “AI” is just shadow IT with GPUs.

1. You Don’t Know What’s In Your Model

Problem
Teams ship “Ferret-something-13B-instruct” with a README and vibes. No documented training data lineage, fine-tuning recipe, or license constraints. That is not defensible in front of an audit or in incident response.

Fix

  • Demand a model card with data sources, intended use, evals, and known limits. Make it a gate.
  • Require provenance attestation for model artifacts. Track who built it, when, with what inputs. Adopt SLSA-style provenance.

2. Containers Pulled From :latest

Problem
docker pull org/runner:latest in CI is how you import tomorrow’s zero-day. If you wouldn’t ship a bank core on :latest, don’t ship inference on it either.

Fix

  • Pin digests, not tags.
  • Sign and verify images with Cosign. Enforce verification at admission.
  • Keep SBOMs alongside images and scan them continuously. SLSA gives you the scaffolding.

3. Runner Roulette: Triton, vLLM, TGI Without Guardrails

Problem
Runners are complex, fast-moving, and security-sensitive. Defaults can expose internal ports, shared memory, or debugging endpoints. Python backends, RDMA paths, and tensor caches become attack surfaces in a hurry.

Fix

  • Maintain a runner matrix with explicit hardening: network exposure, auth, TLS, shared memory, and backends enabled.
  • Subscribe to runner security advisories. Treat them like kernel advisories. Patch timelines in hours, not weeks.
  • Prefer isolated VPCs or service meshes for intra-runner traffic. No public ingress without auth.
  • Performance flags are not security controls.

4. Unsigned, Unpinned Weights

Problem
Weights are executable data. Treat them like code. Pulling arbitrary safetensors from the internet without checks is supply-chain roulette.

Fix

  • Require checksums and signatures for weight files. Verify before first load and at startup.
  • Store weights in a private registry or artifact store with access control and version pinning.
  • Attach and review the model card and license with the artifact.

5. No MLBOMs

Problem  

You SBOM the app but not the model. Without a Machine Learning Bill of Materials (MLBOM) you cannot answer what changed between last week’s good and today’s bad. You lack lineage, license state, and exact artifact fingerprints for the model and its transforms.

What an MLBOM must capture:

  • Base model family, checkpoint name, and exact hash of weight files
  • Tokenizer version and vocab hash
  • Core architecture params and precision choices
  • Training and fine-tuning datasets as versioned references with license and PII handling notes
  • Training recipe: seed, epochs, batch size, LR schedule, augmentations
  • Post-training transforms: distillation, quantization method, kernel choices, and their hashes
  • Adapter metadata: LoRA IDs, ranks, alphas, merge commits
  • Safety and guardrail configs: system prompts, policy packs, classifiers
  • Eval suites and scores with dataset versions
  • Serving profile: runner name and version, tensor parallelism, cache settings
  • Critical runtime deps: CUDA, cuDNN, framework versions
  • Ownership, provenance, and export-control status

Fix 

  • Make MLBOM a gate. No MLBOM, no prod.  
  • Generate MLBOMs in the training and packaging pipelines. Update on every transform. Never hand-edit.
  • Store MLBOMs alongside weights in the artifact registry. Version and keep immutable.
  • Sign MLBOMs and weights. Verify signatures and model hashes at deploy through admission policy.
  • Emit MLBOM ID, model hash, and runner digest in response headers or logs for traceability. Alert on drift.
  • Join MLBOM with container SBOM so incident response can walk the chain from request to runner to weights to data.

6. GPU and Driver Drift

Problem
CUDA, drivers, and runners form a compatibility triangle. Unplanned upgrades cause undefined behavior and open CVEs.

Fix

  • Freeze a tested driver-CUDA-runner matrix and enforce via admission policy.
  • Track CVEs for CUDA, cuDNN, and the runner images. Map them back to SBOM.
  • Blue-green upgrades only. Validate perf and correctness before traffic shift.

7. Shadow Data flows and Private That Isn’t

Problem
Telemetry sinks, inference logs, and third-party connectors quietly move prompts and outputs out of region. Compliance will call this material risk.

Fix

  • Produce a data-flow diagram per model with explicit destinations and retention. Block egress by default.
  • Disable verbose request logging on runners unless isolated and redacted.
  • Contractually lock residency and backup locations with vendors.

8. Secrets and Tokens In Inference Configs

Problem
Bearer tokens and API keys live in YAML, env files, and Helm charts. Runners start with read-write storage tokens they do not need.

Fix

  • Mount short-lived credentials from KMS or secret manager at runtime. Rotate automatically.
  • Apply least-privilege IAM scoped to read-only weights and read-only logs.
  • Ban secrets in images with pre-push scans.

9. Blinded By The Lights Telemetry

Problem
You see QPS and latency. You do not see model provenance at runtime, token source, or cross-region egress. You cannot answer “what model served this answer and with which weights.”

Fix

  • Add runtime attestations: emit model hash, container digest, and runner version per request.
  • Join metrics to cost. Alert on cost-per-token and egress per tenant.
  • Add security signals: failed signature verification, unsigned artifact load attempts, and runner capability changes.

10. Minimal Viable Governance

Problem
AI councils produce PDFs. Attackers produce shells. Map frameworks to controls and ship.

Fix

  • Adopt NIST AI RMF as the risk spine and map to concrete controls in your platform backlog.
  • Use OWASP LLM Top 10 as the threat checklist for application and data risks around the runner.
  • Borrow SLSA for provenance and integrity across the pipeline. Target meaningful levels.

What Good Looks Like In Practice

  • Every artifact signed. Images with Cosign. Weights with a vetted signing flow. Verification enforced by the admission controller.
  • Provenance everywhere. SLSA-style attestations emitted by CI for images and by training pipelines for models. Store alongside the artifact.
  • Hardened runners. Patched within hours, private networks, TLS on node-to-node, Python backends disabled unless required, shared memory scoped and monitored. Treat runner issues like kernel CVEs.
  • Inventory and SBOMs. One registry of truth. You can answer “what model, what runner, what hash” for any request in seconds.
  • Data discipline. No log egress by default. Residency guarantees in contracts and in code.

Quick Wins This Quarter

  1. Block :latest. Require digests and Cosign verification at deploy.
  2. Ban unsigned weights. Centralize model storage with pinned versions and checksums.
  3. Patch runners on a 24-hour SLA and subscribe to their advisories.
  4. Ship a minimal model card template and make it a gate.
  5. Emit runtime attestations: model hash, container digest, runner version on every response.

Final Thought

AI deployment is supply-chain security with bigger bills. Make every component verifiable, patchable, and auditable. If you cannot prove where a model came from, what it runs on, and what it talks to, you are not operating AI. You are operating luck. 

For teams ready to move beyond luck and implement real-world, auditable controls across the LLM supply chain, AppSecEngineer offers hands-on labs, expert-led courses, and bootcamps covering model provenance, artifact signing, SBOM/MLBOM generation, and LLM-specific risk management. Accelerate the journey from theory to practice and make AI security a repeatable and measurable discipline with AppSecEngineer as a training partner.

Debarshi Das

Blog Author
Debarshi is a Security Engineer and Vulnerability Researcher who focuses on breaking and securing complex systems at scale. He has hands-on experience taming SAST, DAST, and supply chain security tooling in chaotic, enterprise codebases. His work involves everything from source-to-sink triage in legacy C++ to fuzzing, reverse engineering, and building agentic pipelines for automated security testing.He’s delivered online trainings for engineers and security teams, focusing on secure code review, vulnerability analysis, and real-world exploit mechanics. If it compiles, runs in production, or looks like a bug bounty target, chances are he’s analyzed it, broken it, or is currently threat modeling it.
4.5

Koushik M.

"Exceptional Hands-On Security Learning Platform"

Varunsainadh K.

"Practical Security Training with Real-World Labs"

Gaël Z.

"A new generation platform showing both attacks and remediations"

Nanak S.

"Best resource to learn for appsec and product security"

Ready to Elevate Your Security Training?

Empower your teams with the skills they need to secure your applications and stay ahead of the curve.
Get Started Now
Copyright AppSecEngineer © 2025
X