The Enterprise AI Security Problems Nobody Wants to Solve

Palo Alto Networks just named AI agents the #1 insider threat for 2026.

According to Gartner, 40% of enterprise applications will integrate AI agents by the end of this year — up from less than 5% in 2025. That's not a gradual shift. That's a security surface explosion.

But here's what the headlines aren't telling you: the biggest AI security problems aren't the ones vendors are racing to solve. They're the ones nobody wants to touch.

I've been following a cybersecurity consultant who contracts with 20,000+ employee financial firms on AI security. Last week, they published a breakdown of what's actually blocking enterprise AI adoption — and it's not prompt injection or model jailbreaks.

It's the unsexy stuff. Authorization that doesn't scale. Data you can't debug. Telemetry that can't tell humans from agents.

These are the problems that don't have clean solutions. And enterprises aren't building them — they're waiting for someone else to figure it out.

Let me walk you through what's actually broken.

The Authorization Problem Nobody Can Solve

Here's the first problem: fine-grained authorization in RAG systems is practically impossible.

Traditional access control is straightforward. Bob has access to Folder A but not Folder B. When Bob requests a file, the system checks his permissions and either grants or denies access. Simple.

Now add AI to the equation.

You want to build a RAG system that queries your document store. Bob asks the AI a question, and it searches your embedded documents for relevant context. But how do you ensure the AI doesn't return information from documents Bob isn't authorized to see?

There are two approaches, and both are flawed.

Approach 1: Replicate permissions into your index. You embed all your documents and include the permission structure. The AI knows who's asking and filters results accordingly.

The problem? Cache invalidation. When Bob's permissions change, how long until your index reflects that? Minutes? Hours? In enterprise environments with complex permission structures, this lag is unacceptable.

In a financial firm, a 10-minute lag means a departing trader could query the AI for confidential portfolio strategies they no longer have rights to see. That's not a theoretical risk — it's a lawsuit waiting to happen.

Approach 2: Check permissions at query time. Your index contains everything. The AI retrieves what it thinks is relevant, then a middleware layer checks the original data store to verify Bob should see each result.

This is architecturally cleaner, but it means your AI is querying data without restrictions first — and then filtering. Security teams hate this because the unrestricted query happens before the permission check.

Neither approach scales cleanly. And here's the uncomfortable reality: most enterprises aren't solving this. They're limiting their AI projects to data that everyone can access anyway.

The consultant put it bluntly: "Enterprises have no interest in solving this problem themselves. They're waiting for other people to solve it."

The Testing Problem You Can't Solve

Here's the second problem: you can't properly test AI systems with classified data.

This sounds obvious when you say it out loud, but the implications are brutal.

When you build an AI system that will eventually handle classified information, you need to test it. You need to debug edge cases. You need to see how it behaves when things go wrong.

But you can't use real classified data for testing — that's a security violation. So you use synthetic data, redacted samples, or carefully controlled test sets.

The problem? Synthetic data doesn't trigger the same edge cases. It doesn't have the same statistical distribution. It doesn't surface the weird corner cases that only appear with real-world sensitive information.

The consultant described it as an existential problem: "We simply do not know what our systems will do with actual classified data until they're in production."

That's not a testing gap. That's flying blind.

In traditional software, you test exhaustively before deployment. With AI systems handling classified data, your first real test is production. And by then, the damage is done.

This isn't a problem vendors are racing to solve. It's a problem vendors don't talk about because there's no good answer.

The Audit Trail That's Already Broken

Here's the third problem — and this one is getting worse by the month.

Your security telemetry can no longer tell the difference between a human and an AI agent.

Traditional security monitoring assumes a clean model: a user logs in, takes actions, and those actions are logged. When something suspicious happens, you trace back to the user who did it. Simple chain of custody.

Now introduce AI agents.

An employee asks Claude to "review these documents and send a summary to the team." The agent accesses files, reads them, maybe writes to other systems, and sends an email. Your SIEM sees all of this — but it looks like the user did it.

The consultant was blunt: "We simply do not know if an action has been performed by a user or an agent anymore."

This isn't theoretical. Agentic browsers are already deployed. Tools like MCP (Model Context Protocol) are designed to give AI agents access to your systems. Every new integration makes the telemetry problem worse.

For compliance, this is a disaster. Your audit trail — the thing regulators rely on to prove who did what — is becoming meaningless. Was it Bob? Or was it Bob's AI agent acting on Bob's behalf? Or was it Bob's AI agent acting in a way Bob didn't intend?

For a regulator demanding proof for a SOX audit, "we're not sure" is a catastrophic answer.

The logs don't know. And neither do you.

Why Nobody's Building The Fix

Here's what's most troubling: enterprises aren't trying to solve these problems.

They're waiting.

The consultant observed this pattern across multiple financial institutions: "Enterprises have no interest in solving this problem themselves. They're waiting for other people to solve it."

The assumption is that Microsoft, Google, or AWS will eventually figure it out. That some vendor will release a product that magically handles fine-grained RAG authorization. That the industry will converge on standards for agentic telemetry.

Maybe they will. Eventually.

But in the meantime, shadow AI proliferates. Employees use ChatGPT for sensitive work because the official tools are too restrictive. Teams build RAG systems on datasets that "everyone can access anyway" — until someone's permissions change and they suddenly can't.

The gap between what enterprises need and what they're building grows wider every month.

And the longer they wait for someone else to solve it, the more risk accumulates in the system.

A Path Forward

Waiting for a vendor is not a strategy. These foundational problems of authorization, testing, and auditing require a new architectural approach. This is why we built Nexusdesk.

For enterprises feeling stuck: Don't wait. Securely deploy Claude in your own VPC, where you control the permissions, the data, and every single audit log. We're onboarding pilot partners now.

→ Request a Pilot: [email protected]

For individuals: The privacy gaps are real. Our browser extension puts you back in control, securing your chats across all major AI platforms.

→ Join Early Access: https://forms.gle/heivj7cx7wo7KQ8u5

What This Means For You

These problems won't be magically solved in 2026. Maybe not in 2027 either.

If you're evaluating AI vendors — or building AI systems internally — here are the questions that matter:

For RAG systems:

- How do you handle document-level permissions?

- What happens when a user's access changes?

- Can you show me the architecture diagram?

For agentic tools:

- Can you prove what actions were taken by users vs agents?

- How do you separate agent actions in your audit logs?

- What's your telemetry strategy for MCP integrations?

For any AI system handling sensitive data:

- What's your testing methodology?

- How do you validate behavior with data you can't test against?

- What are your blast radius containment strategies?

The red flags: vague answers, "we're working on it," deflection to policies instead of architecture.

The architecture test: if they can't show you the logs, they don't have them.

The Hard Problems Won't Wait

These are genuinely hard problems without clean solutions. I'm not going to pretend otherwise.

But "hard" isn't an excuse for "ignore."

The organizations that will survive this transition aren't the ones waiting for perfect solutions. They're the ones building architecture that limits blast radius when things go wrong. Because things will go wrong.

Not waiting for perfect. Building for resilient.

---

For enterprise teams: Stop waiting for vendors. Deploy Claude in your own VPC where you control permissions, data flow, and every audit log. We're onboarding pilot partners — free implementation, full support.

→ Request a Pilot: [email protected]

For individual users: Take control of your AI privacy. Our browser extension protects your conversations across ChatGPT, Claude, Gemini, and more.

→ Join Early Access: https://forms.gle/heivj7cx7wo7KQ8u5

Questions? Hit reply. I read every response.

Thanks for reading.

If this was useful, share it with someone who needs to hear it.

— Anson Zeall

CEO/Founder, Nexusdesk

The Enterprise AI Security Problems Nobody Wants to Solve - Issue #006