Evaluation of Speaker Similarity and Audio Quality in Generated Speech - Korean

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 유사함 – 전체적인 말하는 흐름과 방식이 거의 동일함

4. 유사함 – 대부분 유사하며, 세부적인 차이만 존재함

3. 보통 – 유사한 부분도 있으나 흐름이나 말투에서 차이 존재

2. 다름 – 발화 흐름, 속도, 억양 등에서 여러 차이 존재

1. 매우 다름 – 전혀 다른 방식으로 말하는 것처럼 들림

Model A

Model B

Model C

Model D

Supertone

121 (4.03%)

619 (20.63%)

733 (24.43%)

871 (29.03%)

656 (21.87%)

62 (2.07%)

346 (11.53%)

529 (17.63%)

1148 (38.27%)

915 (30.50%)

134 (4.47%)

742 (24.73%)

852 (28.40%)

803 (26.77%)

469 (15.63%)

197 (6.57%)

825 (27.50%)

834 (27.80%)

781 (26.03%)

363 (12.10%)

1511 (50.37%)

959 (31.97%)

381 (12.70%)

124 (4.13%)

25 (0.83%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

121 (4.03%)

62 (2.07%)

134 (4.47%)

197 (6.57%)

1511 (50.37%)

Deep analysis

5. 매우 유사함 – 전체적인 말하는 흐름과 방식이 거의 동일함

4. 유사함 – 대부분 유사하며, 세부적인 차이만 존재함

3. 보통 – 유사한 부분도 있으나 흐름이나 말투에서 차이 존재

2. 다름 – 발화 흐름, 속도, 억양 등에서 여러 차이 존재

1. 매우 다름 – 전혀 다른 방식으로 말하는 것처럼 들림

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	3.60 0.50 0.24
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	1.93 0.74 0.34
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	1.33 0.34 0.16
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	2.33 0.65 0.30
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	1.60 0.46 0.21
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.60 0.28 0.13
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.00 0.63 0.29
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	2.60 0.72 0.34

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 유사함 – 목소리의 음색과 분위기가 거의 동일함

4. 유사함 – 대부분 유사하며, 1~2회 미세한 차이 있음

3. 보통 – 대체로 비슷하지만 톤 차이가 느껴짐

2. 다름 – 목소리 톤에서 명확한 차이가 존재

1. 매우 다름 – 완전히 다른 사람의 목소리처럼 들림

Model A

Model B

Model C

Model D

Supertone

97 (3.23%)

340 (11.33%)

582 (19.40%)

819 (27.30%)

1162 (38.73%)

89 (2.97%)

508 (16.93%)

891 (29.70%)

790 (26.33%)

722 (24.07%)

118 (3.93%)

542 (18.07%)

868 (28.93%)

848 (28.27%)

624 (20.80%)

175 (5.83%)

757 (25.23%)

855 (28.50%)

743 (24.77%)

470 (15.67%)

1449 (48.30%)

900 (30.00%)

442 (14.73%)

150 (5.00%)

59 (1.97%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

97 (3.23%)

89 (2.97%)

118 (3.93%)

175 (5.83%)

1449 (48.30%)

Deep analysis

5. 매우 유사함 – 목소리의 음색과 분위기가 거의 동일함

4. 유사함 – 대부분 유사하며, 1~2회 미세한 차이 있음

3. 보통 – 대체로 비슷하지만 톤 차이가 느껴짐

2. 다름 – 목소리 톤에서 명확한 차이가 존재

1. 매우 다름 – 완전히 다른 사람의 목소리처럼 들림

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.00 0.42 0.20
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	1.87 0.62 0.29
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	1.73 0.61 0.28
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	2.13 0.59 0.27
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	2.00 0.59 0.28
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.73 0.25 0.12
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.00 0.55 0.26
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	2.20 0.56 0.26

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 자연스러움 - 어색한 부분을 느낄 수 없고, 억양과 리듬이 매우 자연스러움

4. 자연스러움 - 대부분 자연스럽고 부드러움. 미세하게 어색함이 1~2회 정도 발견됨

3. 보통 - 전반적으로 자연스럽고 큰 문제는 없음

2. 부자연스러움 - 어색한 리듬이나 강세가 자주 나타나고 전반적으로 자연스러움이 떨어짐

1. 매우 부자연스러움 - 억양, 리듬 등이 매우 부자연스러움

Model A

Model B

Model C

Model D

Supertone

367 (12.23%)

1019 (33.97%)

855 (28.50%)

605 (20.17%)

154 (5.13%)

149 (4.97%)

596 (19.87%)

818 (27.27%)

1090 (36.33%)

347 (11.57%)

358 (11.93%)

883 (29.43%)

822 (27.40%)

732 (24.40%)

205 (6.83%)

412 (13.73%)

982 (32.73%)

722 (24.07%)

721 (24.03%)

163 (5.43%)

1488 (49.60%)

1059 (35.30%)

330 (11.00%)

119 (3.97%)

4 (0.13%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

367 (12.23%)

149 (4.97%)

358 (11.93%)

412 (13.73%)

1488 (49.60%)

Deep analysis

5. 매우 자연스러움 - 어색한 부분을 느낄 수 없고, 억양과 리듬이 매우 자연스러움

4. 자연스러움 - 대부분 자연스럽고 부드러움. 미세하게 어색함이 1~2회 정도 발견됨

3. 보통 - 전반적으로 자연스럽고 큰 문제는 없음

2. 부자연스러움 - 어색한 리듬이나 강세가 자주 나타나고 전반적으로 자연스러움이 떨어짐

1. 매우 부자연스러움 - 억양, 리듬 등이 매우 부자연스러움

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	3.60 0.66 0.31
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.87 0.69 0.32
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	1.47 0.35 0.17
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	2.53 0.51 0.24
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	1.73 0.44 0.21
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.60 0.28 0.13
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.40 0.50 0.24
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	3.00 0.51 0.24

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 정확 - 발음 실수가 전혀 없으며, 끊어읽기나 음절의 구분도 정확함

4. 정확 - 전반적으로 정확하나, 1~2개의 단어에서 사소한 발음 혹은 끊어읽기 실수가 있음.

3. 무난 - 일부 단어가 어색하게 발음되거나 끊어읽기 타이밍이 부정확함

2. 부정확 - 발음 오류나 잘못된 끊어읽기가 여러 번 발생함

1. 매우 부정확 - 잘못된 발음때문에 이해하기에 어렵고, 문장의 전반적인 발음이 틀림

Model A

Model B

Model C

Model D

Supertone

946 (31.53%)

1195 (39.83%)

650 (21.67%)

180 (6.00%)

29 (0.97%)

595 (19.83%)

994 (33.13%)

822 (27.40%)

502 (16.73%)

87 (2.90%)

911 (30.37%)

1124 (37.47%)

717 (23.90%)

228 (7.60%)

20 (0.67%)

989 (32.97%)

1172 (39.07%)

636 (21.20%)

188 (6.27%)

15 (0.50%)

1855 (61.83%)

850 (28.33%)

249 (8.30%)

44 (1.47%)

2 (0.07%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

946 (31.53%)

595 (19.83%)

911 (30.37%)

989 (32.97%)

1855 (61.83%)

Deep analysis

5. 매우 정확 - 발음 실수가 전혀 없으며, 끊어읽기나 음절의 구분도 정확함

4. 정확 - 전반적으로 정확하나, 1~2개의 단어에서 사소한 발음 혹은 끊어읽기 실수가 있음.

3. 무난 - 일부 단어가 어색하게 발음되거나 끊어읽기 타이밍이 부정확함

2. 부정확 - 발음 오류나 잘못된 끊어읽기가 여러 번 발생함

1. 매우 부정확 - 잘못된 발음때문에 이해하기에 어렵고, 문장의 전반적인 발음이 틀림

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.33 0.40 0.19
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	3.40 0.46 0.21
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	2.80 0.60 0.28
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	4.00 0.42 0.20
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	2.60 0.58 0.27
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.80 0.23 0.11
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	3.47 0.41 0.19
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	3.93 0.39 0.18

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 적합 – 감정과 의도가 명확하게 전달되며, 스크립트의 내용 및 분위기와 매우 잘 어울림

4. 적합 - 전체적으로 감정과 의도 전달이 좋음. 1~2개의 단어에서 감정이 부적절함

3. 보통 - 어느 정도 의도된 표현을 담아내지만, 감정이나 뉘앙스가 다소 부족함

2. 부적합 - 전반적으로 감정 전달이 스크립트와 맞지 않음

1. 매우 부적합 - 스크립트 의도와 전혀 다른 감정 표현

Model A

Model B

Model C

Model D

Supertone

579 (19.30%)

1126 (37.53%)

876 (29.20%)

346 (11.53%)

73 (2.43%)

324 (10.80%)

827 (27.57%)

994 (33.13%)

672 (22.40%)

183 (6.10%)

550 (18.33%)

978 (32.60%)

913 (30.43%)

434 (14.47%)

125 (4.17%)

593 (19.77%)

979 (32.63%)

894 (29.80%)

419 (13.97%)

115 (3.83%)

1679 (55.97%)

980 (32.67%)

295 (9.83%)

45 (1.50%)

1 (0.03%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

579 (19.30%)

324 (10.80%)

550 (18.33%)

593 (19.77%)

1679 (55.97%)

Deep analysis

5. 매우 적합 – 감정과 의도가 명확하게 전달되며, 스크립트의 내용 및 분위기와 매우 잘 어울림

4. 적합 - 전체적으로 감정과 의도 전달이 좋음. 1~2개의 단어에서 감정이 부적절함

3. 보통 - 어느 정도 의도된 표현을 담아내지만, 감정이나 뉘앙스가 다소 부족함

2. 부적합 - 전반적으로 감정 전달이 스크립트와 맞지 않음

1. 매우 부적합 - 스크립트 의도와 전혀 다른 감정 표현

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.20 0.43 0.20
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.80 0.82 0.38
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	1.60 0.46 0.21
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	3.27 0.49 0.23
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	2.13 0.41 0.19
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.67 0.27 0.13
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	2.87 0.41 0.19
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	3.13 0.55 0.26

Overall

Models

By Tag

Model A

Model B

Model C

Model D

Supertone

Tags

Answers

5. 매우 깨끗함 - 전혀 거슬리는 노이즈나 기계음 없이, 음질이 매우 깨끗함

4. 깨끗함 - 희미한 노이즈나 기계음이 일시적으로 1~2회 들림

3. 보통 - 소음이나 왜곡이 어느정도 있으나 크게 거슬리지는 않음

2. 거슬림 - 노이즈가 빈번하게 나타나거나, 특정 구간에서 음성 품질이 심하게 떨어짐

1. 매우 거슬림 - 전체적으로 음질 자체에 심각한 결함이 있음

Model A

Model B

Model C

Model D

Supertone

794 (26.47%)

1003 (33.43%)

667 (22.23%)

397 (13.23%)

139 (4.63%)

796 (26.53%)

1035 (34.50%)

770 (25.67%)

313 (10.43%)

86 (2.87%)

940 (31.33%)

1096 (36.53%)

601 (20.03%)

261 (8.70%)

102 (3.40%)

1116 (37.20%)

1198 (39.93%)

516 (17.20%)

131 (4.37%)

39 (1.30%)

1831 (61.03%)

912 (30.40%)

210 (7.00%)

43 (1.43%)

4 (0.13%)

Tags

All tags

Model A

Model B

Model C

Model D

Supertone

794 (26.47%)

796 (26.53%)

940 (31.33%)

1116 (37.20%)

1831 (61.03%)

Deep analysis

5. 매우 깨끗함 - 전혀 거슬리는 노이즈나 기계음 없이, 음질이 매우 깨끗함

4. 깨끗함 - 희미한 노이즈나 기계음이 일시적으로 1~2회 들림

3. 보통 - 소음이나 왜곡이 어느정도 있으나 크게 거슬리지는 않음

2. 거슬림 - 노이즈가 빈번하게 나타나거나, 특정 구간에서 음성 품질이 심하게 떨어짐

1. 매우 거슬림 - 전체적으로 음질 자체에 심각한 결함이 있음

		Tags	Statistics
character-delee_f_c_happy_n0.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.80 0.31 0.14
character-delee_f_c_script1.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	4.20 0.56 0.26
character-delee_f_c_happy_script1.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	4.27 0.44 0.21
character-delee_f_c_t1.wav character-delee_f_c_happy.wav	Model A reference audio	kor delee female child happy character kor delee female child happy character	2.13 0.46 0.22
character-delee_f_c_p1.wav character-delee_f_c_happy.wav	Model B reference audio	kor delee female child happy character kor delee female child happy character	2.73 0.61 0.28
character-delee_f_c_happy_n1.wav character-delee_f_c_happy.wav	Supertone reference audio	kor delee female child happy character kor delee female child happy character	4.87 0.19 0.09
character-delee_f_c_script2.mp3 character-delee_f_c_happy.wav	Model C reference audio	kor delee female child happy character kor delee female child happy character	3.00 0.66 0.31
character-delee_f_c_happy_script2.mp3 character-delee_f_c_happy.wav	Model D reference audio	kor delee female child happy character kor delee female child happy character	4.40 0.28 0.13