Book a demo
Feedback
Overview
Analysis
Preference Evaluation
This experiment is to obtain the preference between baseline speechtokenizer and our proposed tokenizer
Loading...
Loading...
Question 1. Listen to the voice samples and rate which one is more natural
Comparison | 3 steps
Overall
Proposed Neural Audio Codec (Model B)
-0.66
SpeechTokenizer (Model A)
Answers
1
.
SpeechTokenizer (Model A)
is more natural
0
-1
.
Proposed Neural Audio Codec (Model B)
is more natural
68 (6.80%)
206 (20.60%)
726 (72.60%)
Deep analysis
Distribution
Mean
Files
1
.
SpeechTokenizer (Model A)
is more natural
0
-1
.
Proposed Neural Audio Codec (Model B)
is more natural
MAX 1
0
MIN -1
MAX 1
0
MIN -1
Tags
Filename
Model
Tags
Statistics
x̅
CI
σ
x̅
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.85
CI
0.17
σ
x̅
0.08
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.40
CI
0.32
σ
x̅
0.15
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-1.00
CI
0.00
σ
x̅
0.00
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.90
CI
0.14
σ
x̅
0.07
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
0.50
CI
0.39
σ
x̅
0.18
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.65
CI
0.27
σ
x̅
0.13
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.90
CI
0.14
σ
x̅
0.07
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.65
CI
0.27
σ
x̅
0.13
1
2
3
4
5
...
7
Question 2. Listen to the voice samples and rate which has a closer voice to the reference audio
Comparison | 3 steps
Overall
Proposed Neural Audio Codec (Model B)
-0.43
SpeechTokenizer (Model A)
Answers
1
.
SpeechTokenizer (Model A)
's voice is similar
0
-1
.
Proposed Neural Audio Codec (Model B)
's voice is similar
112 (11.20%)
344 (34.40%)
544 (54.40%)
Deep analysis
Distribution
Mean
Files
1
.
SpeechTokenizer (Model A)
's voice is similar
0
-1
.
Proposed Neural Audio Codec (Model B)
's voice is similar
MAX 1
0
MIN -1
MAX 1
0
MIN -1
Tags
Filename
Model
Tags
Statistics
x̅
CI
σ
x̅
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.70
CI
0.22
σ
x̅
0.11
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.15
CI
0.35
σ
x̅
0.17
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.90
CI
0.14
σ
x̅
0.07
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.65
CI
0.27
σ
x̅
0.13
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
0.40
CI
0.35
σ
x̅
0.17
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.20
CI
0.39
σ
x̅
0.19
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.75
CI
0.21
σ
x̅
0.10
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.55
CI
0.24
σ
x̅
0.11
1
2
3
4
5
...
7
Question 3. Listen to the voice samples and rate which has a better intelligibility
Comparison | 3 steps
Overall
Proposed Neural Audio Codec (Model B)
-0.62
SpeechTokenizer (Model A)
Answers
1
.
SpeechTokenizer (Model A)
is more inteligible
0
-1
.
Proposed Neural Audio Codec (Model B)
is more inteligible
82 (8.20%)
218 (21.80%)
700 (70.00%)
Deep analysis
Distribution
Mean
Files
1
.
SpeechTokenizer (Model A)
is more inteligible
0
-1
.
Proposed Neural Audio Codec (Model B)
is more inteligible
MAX 1
0
MIN -1
MAX 1
0
MIN -1
Tags
Filename
Model
Tags
Statistics
x̅
CI
σ
x̅
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
00:00
--:--
326_221_to_p293_236.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.75
CI
0.21
σ
x̅
0.10
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
00:00
--:--
270_283_to_p293_219.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.50
CI
0.32
σ
x̅
0.15
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
00:00
--:--
280_393_to_p300_048.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-1.00
CI
0.00
σ
x̅
0.00
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
00:00
--:--
272_006_to_p303_144.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.85
CI
0.17
σ
x̅
0.08
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
00:00
--:--
300_125_to_p362_195.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
0.50
CI
0.36
σ
x̅
0.17
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
00:00
--:--
277_010_to_p272_119.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.35
CI
0.31
σ
x̅
0.15
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
00:00
--:--
345_004_to_p284_022.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.70
CI
0.27
σ
x̅
0.13
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
00:00
--:--
311_366_to_p251_303.wav
SpeechTokenizer
Proposed Neural Audio Codec
Target Voice
x̅
-0.55
CI
0.24
σ
x̅
0.11
1
2
3
4
5
...
7