Evaluate and compare the naturalness and overall quality of speech synthesis from Resemble AI Chatterbox and ElevenLabs.
This AB test is designed to assess the performance of Resemble AI Chatterbox and ElevenLabs in generating natural and high-quality speech. Both systems produce audio clips based on 7 to 20-second long inputs and identical text (zero-shot, with no prompt engineering or additional audio processing).
Participants will listen to paired samples and rate them based on the following criteria:
Naturalness:
How human-like and natural the voice sounds, including pauses, word skipping, and hallucinations.
Overall Quality:
A holistic impression of the audio, including pronunciation clarity, smoothness, prosody, and tone consistency.
The input script content includes:
All samples are generated in a zero-shot setting, without fine-tuning or style conditioning.
Professionals working in voice AI, TTS users, and product managers designing audio experiences.
Each sample pair will be rated by participants. The average scores will be analyzed to determine which system performs better in terms of naturalness and overall quality.