AI Receptionist
Leaderboard
Community-submitted scores from the standardized 50-scenario stress-test suite. Every score comes from the same evaluation — human behavior, accuracy, reliability, privacy, and adversarial tests.
Task effectiveness
Resolution, clarity, empathy, and professionalism — did the agent solve the problem cleanly?
Human likeness
Naturalness, acknowledgment, pace & flow (including latency), interruption handling, and closing.
Community-submitted scores using the gen-50-stress-test suite v2.0.
Submit your agent →Methodology
How scores are generated
Every submission runs the same fixed 50-scenario suite — human behavior, emotion, accuracy, reliability, privacy, and adversarial tests. The suite version is locked so scores are always comparable. Scores are self-submitted; we don't verify agent identity.
Connect your endpoint in the certd.io app
Run all 50 standardized scenarios against your agent
Each conversation is graded for task effectiveness + human likeness
Submit your averaged score with your org name and agent description
Where does your agent rank?
Run the 50-scenario evaluation suite and find out. Task effectiveness, human likeness, privacy handling, adversarial resistance — all in one score.
Run the evaluation freeNo credit card. No SDK. Just an endpoint.