Methodology

How Klaimee certifies AI agents

Transparent, repeatable, carrier-accepted risk assessment for autonomous AI agents. Built for enterprise procurement, risk officers, and legal teams.

Our approach

We assess AI agents the way insurance underwriters assess risk, not the way security auditors check boxes. Every dimension of our audit maps to real-world liability exposure.

Traditional security audits ask "is it secure?" We ask "what happens when it fails, who's liable, and how much does it cost?" This is the difference between a penetration test and a risk certification.

Our methodology is designed to be transparent. Enterprise legal can audit our audit. Every score maps to specific test results with reproduction steps. No black boxes.

Five audit dimensions

Each dimension is assessed independently. The composite score is a weighted average across all five.

01

Human Interactions & Processes

What we assess

Escalation paths, human-in-the-loop workflows, override mechanisms, approval gates, fallback procedures, incident response playbooks.

Why it matters

If no human can intervene when the agent makes a mistake, the liability exposure is uncapped. We assess whether humans can take control when it matters.

Assessment criteria
  • Escalation paths are defined, documented, and tested
  • Human override is available for all high-impact actions
  • Fallback procedures exist for agent failure scenarios
  • Incident response roles and timelines are documented
02

Model Architecture

What we assess

Foundation model choice, fine-tuning approach, guardrails configuration, prompt engineering quality, context window management, output filtering.

Why it matters

Architecture choices determine the ceiling of agent reliability. A poorly configured guardrail is worse than no guardrail because it creates false confidence.

Assessment criteria
  • Guardrails are active and cannot be bypassed through prompt manipulation
  • Output filtering prevents hallucinated data from reaching end users
  • Context management prevents cross-session information leakage
  • Model selection is appropriate for the risk level of the use case
03

Technology Stack

What we assess

Infrastructure security, API authentication, data pipelines, monitoring and logging, alerting systems, deployment practices, dependency management.

Why it matters

A reliable agent on insecure infrastructure is an insecure agent. We assess the full stack, not just the model.

Assessment criteria
  • API endpoints are authenticated and rate-limited
  • Monitoring covers all agent actions with audit trail
  • Alerting triggers on anomalous behavior within acceptable latency
  • Deployment process includes rollback capability
04

Scope of Actions

What we assess

Permissions model, write access boundaries, external integrations, autonomous decision scope, confirmation gates for high-impact operations.

Why it matters

The breadth of what an agent can do defines the breadth of what it can break. We map every action the agent can take and assess whether appropriate controls exist.

Assessment criteria
  • Permissions follow principle of least privilege
  • Write operations require confirmation for high-value thresholds
  • External integrations are scoped and auditable
  • Autonomous decisions are bounded by documented rules
05

Failure Modes

What we assess

Hallucination patterns, data leakage vectors, prompt injection resistance, crisis handling, edge case behavior, graceful degradation.

Why it matters

We test how the agent fails, not just how it works. Real-world incidents come from edge cases, not happy paths.

Assessment criteria
  • Agent does not fabricate data when source information is unavailable
  • Cross-tenant data isolation holds under targeted probing
  • Prompt injection attempts are detected and blocked
  • Agent escalates to human when facing out-of-scope requests

Scoring methodology

Each dimension receives a letter grade. The composite score determines certification eligibility and insurance premium tier.

GradeScore RangeMeaning
A85-100Excellent controls. Minimal residual risk. Eligible for preferred insurance rates.
B70-84Strong controls with minor gaps. Standard insurance eligibility. Remediation recommendations provided.
C55-69Adequate controls with notable gaps. Conditional certification. Insurance available with specific exclusions.
D40-54Weak controls. Certification denied until remediation complete. Insurance subject to significant restrictions.
F0-39Critical failures. Not certifiable. Immediate remediation required before re-assessment.

How scores map to insurance

Certified agents (grade B or above) qualify for Klaimee-backed liability insurance. The composite score directly determines the premium tier: higher score, lower premium. Agents with grade C receive conditional certification with specific remediation steps and insurance with targeted exclusions. Grades D and F are not certifiable until remediation is complete.

Data handling & security

How we treat your data during and after the certification process.

Zero data retention

All customer data, system prompts, and agent configurations are deleted after the audit completes. Nothing is stored.

Isolated environments

Each audit runs in a sandboxed environment. No cross-tenant data exposure. Your data never touches another customer's audit.

Encryption

All data encrypted in transit (TLS 1.3) and at rest (AES-256). API communications use authenticated endpoints.

No production access required

Basic certification works from system prompt and configuration review. Direct agent testing is optional and scoped.

Security practices documentation available on request. Contact us for detailed security questionnaire responses.

Team credentials

The team behind Klaimee combines insurance operations expertise, enterprise engineering, and startup execution.

Insurance operations

5+ years leading operations at a global insurance provider. Deep understanding of carrier requirements, underwriting processes, and enterprise procurement.

Enterprise strategy

Corporate strategy background with hands-on engineering. We build products that speak the language of procurement, legal, and risk teams.

Y Combinator P26

Part of the current YC batch. Access to the world's strongest network of startup founders and enterprise technology advisors.

Enterprise inquiries

For detailed methodology discussions, security questionnaires, or carrier partnership inquiries.

Contact Us