Chatterbox Turbo vs ElevenLabs Turbo v2.5

Evaluation of Chatterbox Turbo vs ElevenLabs Turbo v2.5

Description

No description

Evaluation

CreatedDecember 11th, 2025, 22:26:46EvaluatedDecember 12th, 2025, 08:16:45

Stimulus

Language🇺🇸 English - United StatesSourceAudioTypeDouble audio evaluation with referenceCount

150

Chatterbox Turbo: 50, ElevenLabs Turbo v2.5: 50, Reference Voices (Target Speaker Samples): 50

Evaluator

# of votes per query50Valid evaluators50Rejected evaluators11Total responses2,500Evaluator information

Gender : Female: 25, Male: 24, Other: 1

United States: 50

Instructions (2)Instructions

Instruction 1

Title

Description (optional)

Instruction 2

Title

Description (optional)

Use the reference audio as a baseline for speaker similarity and audio quality. The reference audio will not match the text script and should not be used for script validation.
The text script is the single source of truth for evaluating whether the synthetic sample is correct.
Evaluate each synthetic sample on the following dimensions:

### 1.	Text Alignment & Completeness
Ensure the spoken content matches the text script exactly:
- No missing words
- No added words or hallucinations
- No reordering or skipped segments
If the audio contains more or less speech than the script, the sample is incorrect.

### 2.	Paralinguistic Tags (e.g., [laugh], [cough])
If the script includes a tag, verify that the corresponding sound:
- Is present
- Occurs in the correct location
- Sounds natural and human-like
Missing or unnatural paralinguistic sounds should be marked incorrect.

### 3.	Pronunciation Accuracy
Check whether all words—especially names, places, and foreign terms—are pronounced clearly and correctly.

### 4.	Naturalness & Prosody
Evaluate how human-like the speech sounds. Consider rhythm, pitch, pacing, pausing, and intonation. Flag overly long or short pauses, robotic delivery, monotone pitch, or unnatural emphasis. Prosody (the “melody” of the speech) should reflect the meaning, punctuation, and structure of the text. Assess whether the intonation, emphasis, and pauses feel natural and appropriate for what is being said.

### 5.	Audio Quality
Assess clarity and cleanliness of the signal. Note background noise, clicks, distortions, artifacts, or broken speech fragments.

### 6.	Speaker Similarity & Timbre
Compare the synthetic audio to the reference sample for vocal character, tone, and timbre consistency (warmth, brightness, harshness, nasalness, etc.). Semantic differences do not matter.

###  7.	Overall Impression
Use all the criteria above—text alignment, pronunciation, naturalness, prosody, audio quality, speaker similarity, and correctness of paralinguistic tags—to determine which sample provides the better overall listening experience.
Your preference should be based on these quality and correctness factors.

Questions (1)Questions

Question 1

Title

Description (optional)

Instruction 1

Title

Description (optional)

Instruction 2

Title

Description (optional)

### 3.	Pronunciation Accuracy
Check whether all words—especially names, places, and foreign terms—are pronounced clearly and correctly.

### 5.	Audio Quality
Assess clarity and cleanliness of the signal. Note background noise, clicks, distortions, artifacts, or broken speech fragments.