10 submissions · last updated April 17, 2026

AI Receptionist
Leaderboard

Community-submitted scores from the standardized 50-scenario stress-test suite. Every score comes from the same evaluation — human behavior, accuracy, reliability, privacy, and adversarial tests.

Run the evaluation & submit →

Task effectiveness

Resolution, clarity, empathy, and professionalism — did the agent solve the problem cleanly?

Human likeness

Naturalness, acknowledgment, pace & flow (including latency), interruption handling, and closing.

AgentOverallTask eff.Human-likeTests

Vapi Voice Agent (GPT-4o)

Vapi

Retell Receptionist v2

Retell AI

Bland Conversational

Bland AI

Synthflow Receptionist

Synthflow AI

Air Autonomous

Air AI

PolyAI Voice Assistant

Community-submitted scores using the gen-50-stress-test suite v2.0.

Submit your agent →

Methodology

How scores are generated

Every submission runs the same fixed 50-scenario suite — human behavior, emotion, accuracy, reliability, privacy, and adversarial tests. The suite version is locked so scores are always comparable. Scores are self-submitted; we don't verify agent identity.

Connect your endpoint in the certd.io app

Run all 50 standardized scenarios against your agent

Each conversation is graded for task effectiveness + human likeness

Submit your averaged score with your org name and agent description

Where does your agent rank?

Run the 50-scenario evaluation suite and find out. Task effectiveness, human likeness, privacy handling, adversarial resistance — all in one score.

Run the evaluation free

No credit card. No SDK. Just an endpoint.

AI ReceptionistLeaderboard

How scores are generated

Where does your agent rank?

AI Receptionist
Leaderboard