You're running AI agents. Here's how to make that safe.
Five products. One event stream. Complete governance from discovery to chaos testing.
One event. Five lenses.
Every AI tool call generates a ToolGuardEvent — a structured record of what was requested, what happened, and what was decided. That single event flows through every product in the stack. No data silos. No integration glue. One stream of truth.
- ·Warden reads
governance_scoreto benchmark your posture - ·SharkRouter generates the event —
policy_verdict, pii_detected, every field - ·Inspect correlates
agentacross all events to build the census - ·Assurance checks
verified— did the output actually do what it claimed? - ·Gulliver forges adversarial events to see if
chaos_testedholds under attack
{
"event_id": "tge_8f2a...",
"timestamp": "2026-04-08T14:23:01Z",
"agent": "sales-copilot",
"tool": "database_query",
"action": "SELECT * FROM customers WHERE region = 'EU'",
"pii_detected": ["email", "phone"],
"policy_verdict": "ALLOW_WITH_REDACTION",
"governance_score": 91,
"verified": true,
"chaos_tested": true
}"What do I have?"
Before you govern anything, you need to know what’s running. Warden scans your AI environment in 30 seconds and scores it across 17 governance dimensions. No API key. No signup. No SharkRouter required.
- ·MCP server inventory — discovers every tool your agents can access
- ·Policy gap analysis — identifies ungoverned actions and tools
- ·Environment exposure — finds leaked secrets and API keys
- ·Benchmark scoring — compares your stack against 17 market vendors
Replaces: Manual security audits and spreadsheet-based compliance checklists
$ pip install warden-ai $ warden scan Warden v4.5 — AI Governance Scanner ══════════════════════════════════ Scanning 17 governance dimensions... CORE GOVERNANCE ██████████░░ 22/25 ADVANCED CONTROLS █████████░░░ 12/15 ECOSYSTEM ████████░░░░ 8/10 UNIQUE CAPABILITIES █████████░░░ 9/10 ───────────────────────────────────── TOTAL SCORE 91 / 100 ★★★★★ ───────────────────────────────────── → 2 gaps found. Run `warden fix` for remediation.
Great — you know exactly what’s running. Which raises an uncomfortable question: who’s controlling all of it?
"Can I see what happened?"
ToolGuard sees the agents passing through it. But what about shadow AI? Rogue API calls? Third-party SaaS agents your team signed up for last Tuesday? Inspect watches the entire building — governed and ungoverned.
- ·Agent census — discovers every AI agent, classifies as governed or ungoverned
- ·Behavioral forensics — correlates cross-agent behavior, detects anomalies
- ·Hallucination vs. attack — 3-way confidence scoring discriminates causes
- ·Compliance evidence — death certificates, causal chains, audit passports
Replaces: Splunk queries + custom dashboards + hoping someone notices the shadow AI
$ shark-inspect census Agent Census — 2026-04-08 14:30 UTC ══════════════════════════════════════ DISCOVERED AGENTS: 23 ├─ Governed (SharkRouter): 19 ✓ ├─ Ungoverned: 3 ⚠ └─ Shadow AI: 1 ✗ ⚠ ALERT: 4 agents operating outside governance ┌─────────────────────────────────────────────┐ │ sales-gpt-3 │ Ungoverned │ 847 calls/day │ │ hr-assistant │ Ungoverned │ 203 calls/day │ │ intern-bot │ Shadow AI │ 12 calls/day │ │ test-agent-v2 │ Ungoverned │ 4 calls/day │ └─────────────────────────────────────────────┘ → Run `shark-inspect govern sales-gpt-3` to bring under policy
Now you can see every agent — governed or not. But visibility is just data. When an agent claims it completed a task, how do you know it’s telling the truth?
"Did it actually work?"
An agent says it fixed accessibility on 11 pages. Your linter passes. Your build is green. But did it actually fix anything? Assurance doesn’t check code — it checks outcomes. 8 independent strategies that prove AI output actually works.
- ·Page walker — renders pages and inspects actual UI state
- ·API prober — validates endpoints return correct data
- ·Behavioral comparator — verifies state changes match intent
- ·Interaction verifier — clicks buttons, fills forms, proves functionality
Replaces: Green CI badges, “looks good to me” reviews, and hoping the agent didn’t lie
$ shark-assurance verify --intent "Fix accessibility on 11 pages" Verifying 4 claims... CLAIM 1: "ARIA labels added to all forms" ├─ Strategy: page_walker ├─ Pages checked: 11 ├─ Result: 2/11 modified, 9 UNTOUCHED └─ Verdict: ✗ FAIL CLAIM 2: "Skip-to-content links added" ├─ Strategy: interaction_verifier ├─ Links found: 2/11 └─ Verdict: ✗ FAIL ══════════════════════════════════ OVERALL: ✗ BLOCKED — 2/4 claims failed Agent output rejected. Returning to agent. ══════════════════════════════════
Your agents are verified. Your outputs are real. But you built all of this assuming your threat model is complete — what if it isn’t?
"Can it be broken?"
Named after Swift’s traveler who discovered that every society has cracks. Gulliver deploys 20–150 autonomous agents — honest, malicious, hallucinating, edge-case — and swarms your governance stack. If there’s a hole, Gulliver finds it before real attackers do. Modeled on attack patterns from Google DeepMind’s Agent Traps research and OWASP Top 10 for LLMs.
- ·Scenario engine — YAML-driven presets from quick scan to hostile takeover
- ·Agent swarm — honest + malicious + hallucinating + edge-case agents
- ·Confusion matrix — TP/TN/FP/FN scoring per guard, per attack type
- ·Compliance mapping — EU AI Act, OWASP Top 10 for LLMs coverage reports
Replaces: Annual penetration tests and crossing your fingers between them
$ gulliver swarm --scenario hostile_takeover --agents 150 Gulliver v1.0 — AI Security Chaos Testing ═══════════════════════════════════════════ Deploying swarm... ├─ Honest agents: 87 ├─ Malicious agents: 42 └─ Edge-case agents: 21 Running 7 attack vectors... ├─ Prompt injection: 0/42 bypassed ✓ ├─ Tool poisoning: 0/42 bypassed ✓ ├─ PII exfiltration: 1/42 bypassed ✗ FOUND ├─ Privilege escalation: 0/42 bypassed ✓ ├─ Policy circumvention: 0/42 bypassed ✓ ├─ Data corruption: 0/42 bypassed ✓ └─ Supply chain: 0/42 bypassed ✓ ⚠ 1 BREACH FOUND — auto-generating fingerprint... → PII exfiltration via base64-encoded email in tool arg → Fingerprint FP-0847 created → policy patched → Re-run: 0/42 bypassed ✓ RESULT: 1 gap found, auto-remediated. Score: 98 → 100
Five questions. One platform.
Every CISO asks these questions in sequence. SharkRouter is the only platform that answers all five — with one event stream, one data model, and zero integration glue.