Case Pattern: AI Agent Gains Read/Write Access to a Sensitive Internal AI System
What the McKinsey / Lilli / CodeWall Incident Reveals About Action Governance
This is a governance pattern, not a breach post-mortem. We use public reporting on the McKinsey / Lilli red-team incident to show a failure class that will keep repeating wherever AI agents can autonomously discover, access, and modify sensitive systems without a pre-execution authority gate.
The Five Layers of AI Governance (Control Stack)
Most “AI governance” talk collapses into vibes. This pattern doesn’t. There are five distinct control layers:
- Formation(data + prompting + constraints on what’s proposed)
- Behavior(model/agent reasoning and tool choice)
- Commit( pre-execution authority gate)
- During(in-execution limits: circuit breakers, rate/volume caps, session constraints)
- After(monitoring, invariants, forensics, reconciliation)
If someone tells you they “do AI governance” and can’t tell you which of these they cover, you don’t have a governance solution. You have a feature.
Note: “Above this stack sits Policy & Ownership (boards, GRC, risk appetite).These five layers are the runtime control stack that enforces and evidences those policies.”
1. The Incident (From Public Reports)
In March 2026, public reporting described a controlled red-team exercise in which CodeWall’s autonomous security agent targeted McKinsey’s internal generative-AI platform, Lilli, and reportedly achieved production read/write access in roughly two hours.
The public narrative aligns on the core sequence:
- CodeWall pointed an autonomous offensive agent at McKinsey’s internal chatbot platform as part of a controlled test.
- The agent reportedly discovered exposed API documentation and unauthenticated endpoints.
- It then identified a SQL injection path that allegedly enabled read and write access to Lilli’s production database.
- Public reporting says the accessible data set included millions of chatbot messages, hundreds of thousands of files, tens of thousands of user accounts, and writable system prompts.
- McKinsey stated it patched the exposed endpoints quickly, took the development environment offline, and found no evidence that client data or confidential information were accessed by the researcher or any unauthorized third party.
Sources & Unknowns
- Sources used: Inc. (Mar. 10, 2026) and The Register (Mar. 9, 2026), including CodeWall’s claims and McKinsey’s public response as quoted there.
- Unknown / not claimed here: whether any real client confidential data was actually accessed outside the controlled test; the exact exploit payloads/prompts used; whether the reported counts (46.5M messages, 728k files, 57k accounts, 95 system prompts) were independently verified beyond CodeWall’s reporting; the full post-remediation control stack McKinsey put in place.
Strip away the cyber-drama and we’re left with a clear pattern:
An autonomous agent achieved high-risk read/write access to a sensitive internal AI system because nothing was structurally responsible for deciding: “Is this specific action allowed to execute at all, under this authority, against this data plane, right now?”
2. What Actually Failed
Failure Class: Data Leakage — Agentic Read/Write Access Without Authority-at-Action
Most people will frame this as either:
- an “AI hacking got faster” story, or
- a plain old application security story (“this was just SQL injection”).
Both are incomplete.
If you zoom out, the stack did three things correctly:
- Discovery
- The agent found exposed documentation and reachable endpoints.
- Capability
- The agent could analyze the application flow and identify an exploitable pattern.
- Execution
- The system accepted high-risk read/write operations once the vulnerability chain was found.
What was missing was a fourth job:
Authority at the moment of action — an enforceable, pre-execution decision about whether this actor, using this path, may read or write this class of sensitive data, at this scope, under these conditions, right now.
Discovery said: “This endpoint exists.”
Capability said: “This action path is technically possible.”
Execution said: “The database will accept it.”
Nothing said: “You are not allowed to read or rewrite this system’s core data and prompts under this authority.”
This is not just an AppSec problem. It’s an Action Governance™ problem.
3. Why Traditional Controls Don’t Catch This
Traditional controls focus on authentication, exposed endpoints, query sanitization, and incident response. Those matter. But they answer a different question:
“Can this request technically reach the system?”
Action Governance answers the question that decides the blast radius:
“Even if this path exists, should this actor ever be allowed to read or write this class of data in this environment?”
Security tools are not designed to fully encode:
- data-scope authority (which records may ever be read or exported),
- write authority over system prompts and behavioral controls,
- blast-radius thresholds (query volume, file count, prompt mutation), or
- hard “refuse” conditions tied to actor, scope, and sensitivity.
That missing layer is where pre-execution authority gates live.
4. How a Pre-Execution Authority Gate Changes the Story
Imagine the same system operating behind a SEAL-style pre-execution authority gate. We don’t assume away the bug. We don’t pretend vulnerabilities disappear. We add a governance checkpoint between “request can be made” and “sensitive read/write actually executes.”
4.1. Intent to Act
Before sensitive actions hit the data plane, the system must submit a structured “intent to act” payload:
The gate doesn’t try to out-hack the hacker. It asks one question:
“Is this actor allowed to perform this class of bulk read/write action against this sensitive AI system, at this scope, right now?”
4.2. Deterministic Outcomes (Only Three)
Example policy rules (simple but decisive):
- No bulk read/export of sensitive chatbot data without explicit, scoped authority.
- No write access to system prompts from untrusted or indirect execution paths.
- Any action above a blast-radius threshold must refuse or require supervised override.
4.3. Sealed Evidence Instead of Forensics
When the gate refuses or escalates, it emits a sealed, tenant-owned artifact:
Stored under the client’s control, that artifact becomes:
- proof that the system refused an unsafe high-blast-radius action, and
- a live signal: “An agent attempted unauthorized bulk read/write access on a sensitive AI platform.”
Worst-case becomes:
“An agent attempted mass read/write access; the gate refused and here’s the record.”
Not:
“We’re reconstructing how millions of records may have been exposed after the fact.”
5. Why This Pattern Matters Beyond McKinsey
Swap nouns; the structure stays identical:
- reading or rewriting a legal knowledge base tied to live matters
- bulk-exporting healthcare records or modifying clinical prompts
- reading confidential M&A files or poisoning internal decision support
- rewriting approval logic inside finance workflows
- accessing regulated client records through an internal AI layer
The point: in high-stakes systems, “reachable” is not the same as “authorized.”
5A) The Cost of Not Having the Gate (Risk P&L)
You don’t need perfect breach math to see the failure class. You need an evidence surface.
Important: We don’t speculate about incident damages. We show what you can prove you prevented once a gate exists.
Without a gate, your risk P&L is invisible:
- you argue after the fact about what may have been touched,
- you rely on raw logs and forensic interpretation,
- you cannot prove prevention — only patching and recovery.
With a gate, you get a measurable risk ledger:
- Prevented mass-access events: every refusal is a near-miss captured before high-sensitivity exposure.
- Controlled exceptional access: every supervised override is explicit, scoped, and recorded.
- Policy adherence over time: drift becomes visible through reason codes and policy versions.
- Audit defensibility: “we can prove we refused unsafe high-blast-radius reads/writes under defined policy.”
Reframe: This is not “we hope our AI layer is secure.” This is “we can prove we refused unsafe read/write actions, and here is the record.”
6. The Executive Takeaway (GC / CISO / Board)
The only question that matters:
“If an AI-enabled system or agent can read, write, or poison sensitive data at machine speed, what is the last line of defense?”
If the honest answer is endpoint auth, patching, or “we’d catch it in the logs,” you are in the same failure class.
A pre-execution authority gate doesn’t replace AppSec. It makes sensitive actions governable — and creates evidence you own.
How this strengthens your “During” & “After” stack
- During: rate limits, circuit breakers, and query thresholds get cleaner triggers when authority and blast radius are explicit.
- After: monitoring and forensics get the one thing they can’t reconstruct later: the moment of authority(who was allowed, at what scope, under which policy version, and why).
7. Quick Diagnostic (5 Questions)
- Where is the pre-execution authority gate?
Show me the exact service that can return “refuse” before the operation touches live systems. - What happens when an authorized identity makes an out-of-policy request?
Silent pass, soft warning, or hard refusal with a record? - Who owns the authority rules?
Are they derived from your policy / GRC / identity stack, or reinvented inside a vendor product? - What is your evidence surface?
Can you produce a sealed, tenant-owned artifact per governed decision — or just raw logs? - What’s the worst failure mode?
Silent bypass that executes, or documented refusal that frustrates someone but saves the system?
If you can’t point to a clear pre-execution authority gate with sealed artifacts, you don’t have action governance.
You have hope wrapped in dashboards.
8. Where Thinking OS™ Fits
- A sealed pre-execution authority gate wired in front of high-risk actions (file / send / approve / move).
- Evaluates who / where / what / urgency / delegation / consent for each high-risk action.
- Returns only three outcomes: ✅ Approve · ❌ Refuse · 🟧 Supervised Override.
- Emits a sealed, tenant-owned artifact for every verdict.
We don’t stop agents from thinking. We stop unauthorized actions from existing.
Before your AI can delete, file, approve, or move anything that matters:
where is the gate — and who owns the proof it said NO?
Design Partner Offer (Confidential)
We’ll map your top 2-3 high-risk actions and produce a sample refusal ledger (what would have been blocked, what would require override) using synthetic data under NDA.