10 submissions · last updated April 17, 2026

AI Receptionist
Leaderboard

Community-submitted scores from the standardized 50-scenario stress-test suite. Every score comes from the same evaluation — human behavior, accuracy, reliability, privacy, and adversarial tests.

Task effectiveness

Resolution, clarity, empathy, and professionalism — did the agent solve the problem cleanly?

Human likeness

Naturalness, acknowledgment, pace & flow (including latency), interruption handling, and closing.

AgentOverallTests

Community-submitted scores using the gen-50-stress-test suite v2.0.

Submit your agent →

Methodology

How scores are generated

Every submission runs the same fixed 50-scenario suite — human behavior, emotion, accuracy, reliability, privacy, and adversarial tests. The suite version is locked so scores are always comparable. Scores are self-submitted; we don't verify agent identity.

01

Connect your endpoint in the certd.io app

02

Run all 50 standardized scenarios against your agent

03

Each conversation is graded for task effectiveness + human likeness

04

Submit your averaged score with your org name and agent description

Where does your agent rank?

Run the 50-scenario evaluation suite and find out. Task effectiveness, human likeness, privacy handling, adversarial resistance — all in one score.

Run the evaluation free

No credit card. No SDK. Just an endpoint.