[Case Study] How Resemble AI Used Podonos to Benchmark Chatterbox

AI model evaluation isn’t just a final step. It’s the launchpad.

That’s the approach Resemble AI took when preparing to opensource the Chatterbox, their newest text-to-speech (TTS) model. Determined to release not only a high-performance model but also one backed by transparent benchmarks, the Resemble AI team used Podonos’ evaluation solution to put the Chatterbox on the ring. See the evaluation report.

[Eleven Labs VS Chatterbox by Resemble AI]

The Challenge: Proving Readiness Before Open Source

Many voice AI players say, “Our model sounds great. Just listen to our samples.” But without accurate measurements, such claims remain subjective and rely heavily on guesswork. In reality, most models tend to be “guesstimated,” which introduces significant bias. By conducting in-depth analysis and publishing evaluation reports, teams can provide a clear, data-driven picture of how well their models truly perform.

For Resemble AI, it wasn’t enough to say that Chatterbox performed well compared to others. Internally, they believed the model could compete with leading alternatives like Eleven Labs. But without third-party evaluation, it would be hard to establish trust with the broader AI community.

The Solution: Fast & Automated, Human-Centric Evaluation on Podonos

Podonos provided the ideal platform to benchmark Chatterbox. With our evaluation service, Resemble AI was able to:

Compare Chatterbox head-to-head against Eleven Labs in a controlled A/B test
Evaluate on real-world use cases and nuanced prompts
Receive detailed feedback from diverse, trusted evaluators
Get results in less than 12 hours

Podonos' workflow made it easy to set up, customize, and launch the evaluation process without the usual headaches of contractor management, pipeline setup, and manual analysis.

[Click image to see the full report]

The Outcome: Data-Backed Confidence to Go Open

The results spoke for themselves. With clear strengths in naturalness, Chatterbox earned competitive marks that validated the model’s release.

Armed with this data, Resemble AI confidently open sourced Chatterbox on both Github and Hugging Face, inviting the global AI community to explore, adopt, and build upon their work.

Launch with Confidence

Evaluating AI models is one of the most critical steps in improving their performance. In Resemble AI’s case, fast and accurate evaluation not only helped enhance their model’s quality, but also boosted trust by leveraging Podonos’ transparent and reliable benchmarking. They didn’t just release a TTS model. They released it with credibility.

Other readings