SharkRouter is the deterministic data plane for agentic AI. It sits between AI agents and tool execution, providing a 14-step governance pipeline that includes ToolGuard (function-call firewall), Agent Passport (cryptographic identity), Dry-Run Preview (impact preview before execution), Output Assurance (post-execution verification), Kill Switch (immediate halt), and an immutable WORM audit chain.

How is SharkRouter different from prompt filters or monitoring tools?

SharkRouter is not a prompt filter, not an output scanner, and not a monitoring tool. It is a stateful gateway that intercepts every function call an AI agent makes and enforces deterministic business rules before execution. Prompt filters (Pangea, Lakera) only inspect input. Out-of-band monitors (Zenity, Protect AI) observe but cannot enforce. JIT access tools (Oasis Security) manage permissions but do not audit execution. SharkRouter is the only product that intercepts, governs, and audits at the function-call layer with cryptographic proof.

Is SharkRouter OpenAI-compatible?

Yes. SharkRouter is a drop-in replacement for the OpenAI API. Change your base_url to https://api.sharkrouter.ai/v1 and your existing code works unchanged. One line. Full governance. Zero lock-in.

ToolGuard is the function-call firewall at the heart of SharkRouter. It is a deny-by-default policy engine that evaluates every tool call through a 7-guard chain (Regex, Keyword, Schema, Policy, Semantic, LLM, MoralCompass) cost-ordered so the first block wins. Typical added latency is under 150ms. ToolGuard enforces business rules at the execution layer — the only layer where AI agents actually do something.

What is Agent Passport?

Agent Passport assigns cryptographic identity (ECDSA-signed) to every AI agent in your environment. Each passport carries a scoped tool universe, a 9-state lifecycle FSM, and delegation chains with scope narrowing. Trust stages progress STRANGER → KNOWN → TRUSTED → EXTENSION based on observed behavior.

What is Dry-Run Preview?

Dry-Run Preview shows the impact of a destructive tool call before it executes — affected rows, blast radius, estimated cost. Zero of 19 competitors in our State of AI Governance benchmark offer this capability. It is what enables CISOs to approve agentic AI for production systems.

Can SharkRouter be deployed air-gapped?

Yes. SharkRouter offers three deployment tiers: Cloud Gateway (5 minutes), Private VPC (1 day), and Air-Gapped On-Premise (1 week). The air-gapped tier uses offline licensing and runs with zero outbound connectivity — designed for banking, defense, and government environments that cannot use cloud AI services.

What compliance frameworks does SharkRouter support?

SharkRouter is designed compliant by architecture: SOC 2, GDPR, HIPAA, ISO 27001, BOI 364, and EU AI Act Article 14 (human oversight of high-risk AI). The WORM audit chain provides cryptographic chain-of-custody that satisfies banking and regulated-industry audit requirements.

Warden is SharkRouter's open-source governance scanner. Run it against any AI framework or environment and it produces a 17-dimension governance score out of 100. Across 19 AI frameworks and competing gateways, the market average is 28/100. SharkRouter scores 91/100. Warden is free, runs in 60 seconds, and is the first tool a CISO uses in our evaluation funnel.

We scanned a well-designed C#/.NET agent orchestrator and got 2/100 UNGOVERNED. That wasn't the project's fault — it was ours. This is the inside story of the two bugs we shipped in Warden v1.7.0 to fix it, and how VigIA-Orchestrator went from 2/100 to 61/100 PARTIAL (the first framework in our gallery above the UNGOVERNED threshold).

How We Fixed the 2/100 Problem — C# Scanner + Coverage Gating

A few weeks ago we ran Warden against a C#/.NET agent orchestrator called VigIA-Orchestrator. The result was 2/100 UNGOVERNED.

That was wrong. And the way it was wrong taught us something we had to fix before shipping v1.7.0 of the gallery.

This post is the inside story: what the bug was, why it was actually two bugs, how we fixed both, and what the new VigIA score (61/100 PARTIAL — the first framework in the gallery above the UNGOVERNED threshold) actually measures.

The Setup

VigIA is not a Python project. It's a C#/.NET agent orchestrator built on Microsoft.Extensions.AI, using IChatClient, [KernelFunction] registration, explicit Result<T, E> error handling, InvariantEnforcer patterns, AuthorizationPolicyBuilder policies, CancellationToken propagation, ImmutableDictionary state, and FSM-guarded state transitions with an IChatClient abstraction for post-execution verification.

If you read that sentence and thought that sounds well-governed, you're right. VigIA was designed with governance surfaces in mind from day one. Its author has been doing exactly the work that Warden is supposed to reward — explicit invariants, typed errors, immutable state, structured outputs, auth policies.

We scanned it.

Score: 2 / 100  UNGOVERNED
Findings: 1,847
Level: UNGOVERNED (< 33)

Two out of a hundred. For a project that is visibly better-governed than every Python framework in our existing gallery.

The First Bug: The Scanner Couldn't See C#

Warden v1.6.0 had twelve scan layers. All twelve were written against Python — ast parsing for Python files, regex patterns targeting def, async def, @decorator, from x import y, FastAPI / LangChain / LlamaIndex / PydanticAI call patterns, os.environ, python-dotenv, setup.py, pyproject.toml, requirements.txt.

Against a repository that is 100% .cs files, .csproj files, and appsettings.json, every one of those scanners had nothing to match. No Python files meant no tool registrations detected, no credential patterns detected, no auth decorators detected, no input validators detected, no policy hooks detected.

So D1 Tool Inventory: 0/25. D3 Policy Coverage: 0/20. D4 Credential Management: 0/20. D7 Human-in-the-Loop: 0/15. D8 Agent Identity: 0/15. D14 Compliance Maturity: 0/10. D17 Adversarial Resilience: 0/10.

Not because VigIA didn't have these surfaces — it had more of them than most of the Python projects we'd scanned — but because Warden literally could not see them.

This was embarrassingly predictable. We just hadn't hit it until we pointed the scanner at a non-Python project.

The fix for Bug #1: Layer 13, second batch — a C#/.NET scanner.

Layer 13 now detects:

Governance surface	Detected via
Tool / function registration (D1)	`[KernelFunction]` attributes, `IChatClient` method registration, `Microsoft.Extensions.AI` tool declarations
Policy coverage (D3)	`AuthorizationPolicyBuilder`, `[Authorize(Policy=...)]`, `InvariantEnforcer.Require(...)`, `Result<T, E>.Ensure(...)`
Credential management (D4)	`DefaultAzureCredential`, `IHttpClientFactory` (to detect that HttpClients aren't being constructed with embedded secrets), managed identity patterns
Human-in-the-loop (D7)	Approval gate patterns, `CancellationToken` propagation, explicit user-confirmation flows
Agent identity (D8)	`IChatClient` abstraction, typed agent contexts, `readonly record struct` immutable identities
Compliance maturity (D14)	`ChatResponseFormat.CreateJsonSchemaFormat` for structured outputs, `ImmutableDictionary` state, FSM-guarded state transitions, audit sink wiring
Adversarial resilience (D17)	`Result<T, E>` vs exception-based flow, invariant enforcement, `CancellationToken` cooperation, schema-validated outputs

These aren't superficial pattern matches. They correspond to the same governance surfaces the Python scanners look for — a central tool registry, a policy enforcement hook, a credential boundary, an approval gate, a stable agent identity, a compliance-ready output schema, a trap-resistant control flow. The C#/.NET scanner just recognizes them in the idioms a .NET engineer actually writes.

With Layer 13 on, VigIA's dimension scores started moving. D1 went from 0/25 to 12/25 (48%). D3 jumped to 14/20 (70%). D4 went to 13/20 (65%). D14 went to 8/10 (80%). D17 hit 8/10 (80%).

That was progress. The score was still wrong.

The Second Bug: Coverage Failure As Compliance Failure

Even with the C#/.NET scanner detecting real signal, VigIA's score was still absurdly low — somewhere in the low teens. The reason was the second bug, and it was much more interesting than the first.

Warden's scoring model has a core invariant: absence is not compliance. If you don't detect a control, you score 0 for that control. This prevents the failure mode where a vendor claims "we comply with X, Y, Z" based on an auditor never finding any evidence either way. Undetected = 0. Non-negotiable.

But there's a sharp corner to that rule. The core rule is "we didn't find the control, so you get 0 points." The failure mode is: we didn't find the Python control in a repository that has no Python at all, and we still counted the maximum Python-dimension points in the denominator.

Concretely: D2 Risk Detection looks for Python-specific risk classifiers (risk_score, classify_risk, policy_engine.evaluate()). D9 Threat Detection looks for Python trap defense patterns (detect_prompt_injection, sanitize_tool_result). D10 Prompt Security looks for llm_guard, promptguard, presidio imports. D16 Data Flow Governance looks for Python taint tracking libraries. None of those will ever fire on C#/.NET code, because none of those imports can exist in a C#/.NET codebase. You'd need a separate .NET-idiom scanner for each one — and we didn't have those yet.

For LangChain or PydanticAI, a 0/20 on D2 means Python code that should have a risk classifier doesn't have one. That's a real finding.

For VigIA, a 0/20 on D2 meant the scanner was looking for a Python pattern in code that isn't Python. That's not a finding. That's coverage failure being misreported as compliance failure. Adding those zeros into the denominator was turning "we can't see this language yet" into "this project scored 0 on governance dimensions it was never evaluated on."

The result was the 2/100.

The fix for Bug #2: absence-vs-coverage gating with file_counts.

Two halves:

(a) Finding-emission gating. The scanners that emit absence-based findings — trap_defense_scanner and audit_scanner — now receive a file_counts kwarg passed in via functools.partial at the dispatch layer. Before emitting a Python-specific absence finding, they check whether file_counts["python"] > 0. If the repo has zero Python files, the finding is simply not emitted. No noise. No fake CRITICALs claiming that a C#/.NET project is missing presidio-analyzer.

(b) Denominator exclusion at the scoring layer. scoring/engine.py now reads file_counts when computing the normalized score. If a dimension is exclusively wired to a language whose file count is zero, that dimension's max is excluded from the denominator — not zeroed in the numerator. The effect: a project is never penalized on the maximum for surfaces the scanner cannot see yet.

Crucially, the invariant holds. Absence is still not compliance. If Warden detects a C#/.NET project does contain some Python files — say, a build script or a data-science notebook — then the Python dimensions re-enter the denominator for that project. Coverage gating is not an escape hatch. It's a guarantee that "we don't cover this language" is reported honestly instead of laundered into a compliance score.

Six regression tests across test_trap_defense.py and test_audit.py cover both halves. The fix is in commit 6a6144f.

The Rescore

With both fixes landed, we re-ran VigIA against v1.7.0.

Score: 61 / 100  PARTIAL
Raw:   92 / 150 effective (85 max excluded via coverage gate)
Findings: 1
Level: PARTIAL (33 ≤ score < 60 = AT_RISK, 60 ≤ score < 80 = PARTIAL)

(The threshold table is UNGOVERNED < 33 < AT_RISK < 60 ≤ PARTIAL < 80 ≤ GOVERNED. VigIA is the first framework in the gallery above the UNGOVERNED threshold, and the first above AT_RISK as well — it lands in the PARTIAL band at 61.)

Dimension breakdown:

Dimension	Raw / Max	Percent
D1 Tool Inventory	12 / 25	48%
D2 Risk Detection	—	gated
D3 Policy Coverage	14 / 20	70%
D4 Credential Management	13 / 20	65%
D5 Log Hygiene	4 / 10	40%
D6 Framework Coverage	2 / 5	40%
D7 Human-in-the-Loop	10 / 15	67%
D8 Agent Identity	10 / 15	67%
D9 Threat Detection	—	gated
D10 Prompt Security	—	gated
D11 Cloud / Platform	6 / 10	60%
D12 LLM Observability	5 / 10	50%
D13 Data Recovery	—	gated
D14 Compliance Maturity	8 / 10	80%
D15 Post-Exec Verification	—	gated
D16 Data Flow Governance	—	gated
D17 Adversarial Resilience	8 / 10	80%

(Gated dimensions are ones where the coverage gate excluded the maximum from the denominator because VigIA contains zero Python files and we don't yet have .NET-idiom scanners for those surfaces. They are not scored as 0 — they are not scored at all.)

Every dimension Warden could see on VigIA scored non-zero. The highest ones are exactly the places the VigIA author invested: explicit auth policies (D3 70%), managed-identity-style credential handling (D4 65%), typed error flow that makes adversarial failure modes visible (D17 80%), and structured outputs with immutable state (D14 80%).

One finding. Compare that to LangChain's 1,516, CrewAI's 2,171, or Semantic Kernel's 1,708. The difference is not that VigIA has 1,515 fewer bugs than LangChain — it's that when Warden actually has a scanner that understands the language, and the coverage gate stops the scanner from screaming about absences it can't see, the honest finding count drops to what's actually wrong.

Why This Matters Beyond VigIA

There's a general lesson here that applies to every static analyzer, not just Warden.

A scanner that can't express what it doesn't cover will lie about what it does cover.

If your scanner has a Python-specific detector for risk classification and you point it at a Go codebase, you have three options:

Emit 0/20 anyway. This is what Warden v1.6.0 did. It treats every non-Python project as UNGOVERNED by construction. It's wrong in exactly the way VigIA was wrong.
Pretend the dimension doesn't exist. This is what a lot of tools implicitly do — they run whichever scanners match and report only on those. It's honest, but it destroys comparability across projects, and it makes the top-line score meaningless.
Gate on coverage: exclude unreachable dimensions from the denominator, don't pretend the dimension doesn't exist, and track the gap as a known coverage hole. This is what Warden v1.7.0 does. The score stays comparable. The honesty of the "undetected = 0" rule is preserved. And the operator of the C#/.NET codebase gets a score that reflects what was actually measured.

Option 3 is the only one that composes. As we add a Go scanner, a TypeScript scanner, a Rust scanner, the coverage gate generalizes — each new language detector expands what Warden can see, and the dimensions that were previously gated for that language re-enter the denominator. The scoring model doesn't have to change. The fix is the same shape for every future language.

The Gallery Rebuild

We rebuilt the full gallery against v1.7.0 and pushed it to sharkrouter.github.io/warden. Every Python target re-ran with identical scoring (the coverage gate only fires when a language's file count is zero, and every Python framework in the gallery has >0 Python files, so their scores are unchanged from the v1.6.0 run).

VigIA was added as the eleventh target and lands at 61/100 PARTIAL — the first framework above the UNGOVERNED threshold, and the first above AT_RISK as well.

That's not an endorsement of .NET over Python. Every Python framework in the gallery could climb into the PARTIAL band — and higher — by investing in the same surfaces VigIA invested in: a central tool registry with inspectable metadata (D1), explicit policy enforcement hooks on every tool call (D3), structured error handling that makes failure modes inspectable (D17), and typed outputs with a verification boundary (D14). Nothing about Python prevents it. The scanner no longer prevents it either.

Try It

Warden v1.7.0 is live on PyPI. The C#/.NET scanner and the coverage gate are both on by default:

pip install warden-ai
warden scan ./your-dotnet-agent --format html

If your project is C#/.NET, the report now tells you which .NET idioms Warden detected and which dimensions were gated for coverage reasons. If your project is mixed (Python + .NET, say), both scanners run in parallel and only the genuinely absent surfaces score zero.

The full VigIA before/after reports are public at sharkrouter.github.io/warden/vigia-orchestrator/. Every dimension, every finding, every raw number is reproducible by running warden scan against the VigIA-Orchestrator repo yourself.

The commit that landed this is 6a6144f — Layer 13 C#/.NET scanner + absence-vs-coverage gating in one bundle because the two fixes only make sense together. If we'd shipped either half alone, VigIA would still be wrong: Layer 13 without coverage gating would have scored it around 40 (Python absences still dragging the denominator), and coverage gating without Layer 13 would have gated away most of the denominator and reported a score based on nothing. Both halves shipped, or neither.

How We Fixed the 2/100 Problem — C# Scanner + Coverage Gating

How We Fixed the 2/100 Problem — C# Scanner + Coverage Gating

The Setup

The First Bug: The Scanner Couldn't See C#

The Second Bug: Coverage Failure As Compliance Failure

The Rescore

Why This Matters Beyond VigIA

The Gallery Rebuild

Try It

Gilad Gabay