Description

Objective

Evaluate and compare the naturalness and overall quality of speech synthesis from Resemble AI Chatterbox and ElevenLabs.

Test Description

This AB test is designed to assess the performance of Resemble AI Chatterbox and ElevenLabs in generating natural and high-quality speech. Both systems produce audio clips based on 7 to 20-second long inputs and identical text (zero-shot, with no prompt engineering or additional audio processing).

Participants will listen to paired samples and rate them based on the following criteria:

Naturalness:
How human-like and natural the voice sounds, including pauses, word skipping, and hallucinations.
Overall Quality:
A holistic impression of the audio, including pronunciation clarity, smoothness, prosody, and tone consistency.

The input script content includes:

Conversational speech
Narrative passages
Emotionally expressive lines

All samples are generated in a zero-shot setting, without fine-tuning or style conditioning.

Target Audience

Professionals working in voice AI, TTS users, and product managers designing audio experiences.

Output & Analysis

Each sample pair will be rated by participants. The average scores will be analyzed to determine which system performs better in terms of naturalness and overall quality.

Instruction 1

Title

Description (optional)

audio

00:0000:00

Instruction 2

Title

Description (optional)

audio

00:0000:00

Instruction 3

Title

Description (optional)

audio

00:0000:00

Instruction 4

Title

Description (optional)

audio

00:0000:00

Instruction 5

Title

Description (optional)

audio

00:0000:00

Instruction 6

Title

Description (optional)

audio

00:0000:00

Instruction 7

Title

Description (optional)

audio

00:0000:00

Instruction 8

Title

Description (optional)

audio

00:0000:00

Question 1

Title

Description (optional)

Description

Objective

Evaluate and compare the naturalness and overall quality of speech synthesis from Resemble AI Chatterbox and ElevenLabs.

Test Description

Participants will listen to paired samples and rate them based on the following criteria:

Naturalness:
How human-like and natural the voice sounds, including pauses, word skipping, and hallucinations.
Overall Quality:
A holistic impression of the audio, including pronunciation clarity, smoothness, prosody, and tone consistency.

The input script content includes:

Conversational speech
Narrative passages
Emotionally expressive lines

All samples are generated in a zero-shot setting, without fine-tuning or style conditioning.

Target Audience

Professionals working in voice AI, TTS users, and product managers designing audio experiences.

Output & Analysis

Each sample pair will be rated by participants. The average scores will be analyzed to determine which system performs better in terms of naturalness and overall quality.

Title

Description (optional)

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00

Title

Description (optional)

audio

00:0000:00