AI Security

June 6, 2026 - 11 min read

AI Security is where traditional DevSecOps meets nondeterministic systems, autonomous tools, MCP access, and agent workflows that can act on production context. I care about this because agentic DevOps is only useful if the agent can be trusted inside real operational boundaries.

Core Defensive Patterns

The defensive model cannot rely on model judgment alone. LLMs can reason well and still miss policy, misread intent, or over-trust a tool result, so the outer system needs deterministic controls.

The pattern I want to preserve is hybrid governance: scanners, IAM boundaries, approval gates, and policy-as-code around the agent; contextual reasoning and runbook interpretation inside the agent.

  • Hybrid governance: Use SAST, secret scanning, dependency checks, IAM boundaries, and deployment gates as hard controls that an agent cannot waive by explanation.
  • Decision-time guidance: Inject narrow constraints when an agent is about to call a tool, open a ticket, change infrastructure, or run a command.
  • Governance-first DevOps: Treat agents as production actors with least privilege, audit logs, and clear approval boundaries.

Agentic DevOps Risk

The risk changes when automation becomes conversational. A CI job usually follows a predefined path, but an agent can choose tools, interpret logs, and chain actions in ways the platform owner did not pre-plan.

This makes prompts, tool descriptions, MCP server scopes, local credentials, and runbook quality part of the security model. Bad context can become bad infrastructure.

  • Implementation note: Record every tool call with input, output, decision reason, and approval state so the agent can be audited like a deployment system.
  • Failure mode: An agent with broad token access can convert a harmless diagnostic task into a privileged operational action.
  • Cross-link: MCP Runbook Safety is the child page for token scopes, approval gates, and tool-use constraints.

Threat Model for Agentic Systems

I am treating an agent as a semi-autonomous operator with access to context, tools, credentials, and sometimes production-adjacent systems. That means the threat model has to include prompt injection, unsafe tool selection, stale documentation, over-broad MCP servers, hidden credential exposure, and mistaken confidence from the model.

The important shift is that an agent can combine individually safe actions into an unsafe chain. Reading a log, opening a runbook, and preparing a Terraform change are harmless separately; together they become an operational workflow that needs policy and review.

For this wiki, I want each agentic pattern to carry a security note: what it can read, what it can write, what evidence it uses, what approval it needs, and how the action can be reversed.

  • Prompt injection: Treat external pages, logs, issues, and docs as untrusted input that can try to change the agent's instructions.
  • Tool confusion: Separate diagnostic tools from mutating tools so the agent cannot escalate a troubleshooting request into a deployment.
  • Credential boundary: Keep agent credentials short-lived, least-privileged, and scoped to the exact environment being inspected.
  • Audit trail: Store tool input, tool output, model reasoning summary, approval state, and resulting diff for each important action.

Controls Map

The controls I care about are layered. The model can explain and propose, but enforcement should live in IAM, policy-as-code, CI checks, environment protections, and human review. This keeps safety outside the model's memory.

A useful agentic DevSecOps system should have read-only defaults, explicit write escalation, deployment environment separation, deny-by-default tool routing, and rollback instructions generated before any change is applied.

This gives me a concrete review checklist when I evaluate new tools: can it constrain tool access, log decisions, cite evidence, handle conflicting instructions, and stop when the request becomes unsafe?

  • Read first: Logs, metrics, code, docs, and tickets are the safest starting surface for useful automation.
  • Write later: Mutating actions should require a generated plan, a diff, policy checks, and an approval boundary.
  • Rollback: Any change proposal should include how to undo it and what signals confirm success or failure.

Questions to Revisit

The unsolved part is how much autonomy is safe for recurring operational work. A read-only SRE assistant is different from an agent that can change Terraform, rotate secrets, or modify production workloads.

The wiki should keep collecting examples where agentic workflows improve speed without weakening accountability.

  • Question: What should be the minimum evidence required before an agent can recommend an infrastructure change?
  • Question: Which actions should always require human approval even if the model is confident?
  • Question: How should AI security checks fit into existing DevSecOps controls instead of becoming a parallel toolchain?

Nested Pages