Book a demo
Feedback
Overview
Analysis
Chatterbox Turbo vs ElevenLabs Turbo v2.5
Evaluation of Chatterbox Turbo vs ElevenLabs Turbo v2.5
Description
No description
Evaluation
Created
December 11th, 2025, 22:26:46
Evaluated
December 12th, 2025, 08:16:45
Stimulus
Language
🇺🇸 English - United States
Source
Audio
Type
Double audio evaluation with reference
Count
150
Chatterbox Turbo: 50, ElevenLabs Turbo v2.5: 50, Reference Voices (Target Speaker Samples): 50
Evaluator
# of votes per query
50
Valid evaluators
Rejected evaluators
Total responses
Evaluator information
Instructions
Questions
Questions (1)
Question 1
Title
Description
(optional)
Use the evaluation guidelines to decide which sample is better overall. Focus on script alignment, pronunciation accuracy, naturalness, prosody, audio quality, paralinguistic correctness, and similarity to the reference speaker.
How strongly do you prefer one sample over the other?
A is better
B is better
Instructions (2)
Instruction 1
Title
Description
(optional)
The audio files may contain subtle static noise, artifacts, or click noises that are difficult to detect without headphones or in a noisy environment.
Instruction 2
Title
Description
(optional)
Use the reference audio as a baseline for speaker similarity and audio quality. The reference audio will not match the text script and should not be used for script validation. The text script is the single source of truth for evaluating whether the synthetic sample is correct. Evaluate each synthetic sample on the following dimensions: ### 1. Text Alignment & Completeness Ensure the spoken content matches the text script exactly: - No missing words - No added words or hallucinations - No reordering or skipped segments If the audio contains more or less speech than the script, the sample is incorrect. ### 2. Paralinguistic Tags (e.g., [laugh], [cough]) If the script includes a tag, verify that the corresponding sound: - Is present - Occurs in the correct location - Sounds natural and human-like Missing or unnatural paralinguistic sounds should be marked incorrect. ### 3. Pronunciation Accuracy Check whether all words—especially names, places, and foreign terms—are pronounced clearly and correctly. ### 4. Naturalness & Prosody Evaluate how human-like the speech sounds. Consider rhythm, pitch, pacing, pausing, and intonation. Flag overly long or short pauses, robotic delivery, monotone pitch, or unnatural emphasis. Prosody (the “melody” of the speech) should reflect the meaning, punctuation, and structure of the text. Assess whether the intonation, emphasis, and pauses feel natural and appropriate for what is being said. ### 5. Audio Quality Assess clarity and cleanliness of the signal. Note background noise, clicks, distortions, artifacts, or broken speech fragments. ### 6. Speaker Similarity & Timbre Compare the synthetic audio to the reference sample for vocal character, tone, and timbre consistency (warmth, brightness, harshness, nasalness, etc.). Semantic differences do not matter. ### 7. Overall Impression Use all the criteria above—text alignment, pronunciation, naturalness, prosody, audio quality, speaker similarity, and correctness of paralinguistic tags—to determine which sample provides the better overall listening experience. Your preference should be based on these quality and correctness factors.
50
11
Gender : Female: 25, Male: 24, Other: 1
United States: 50
2,500