Naturalness
Evaluate the naturalness of your speeches/audios.
Intro
One of the popular measuress in the synthesized speech is the naturalness: measuring how natural the synthesized speech is. One of the most popular naturalness evaluation methods for speech/audio is mean opinion score (MOS). Its scale typically ranges from 1 (lowest naturalness like old robot) to 5 (highest naturalness like human) with 1 granularity (which is called five-point Likert Scale). Through podonos, you will evaluate the naturalness of your speech/audio in a fully managed way.
Example
Our first example uses AWS Polly to generate synthesized human voice and uses podonos for evaluation. Of course, you can use your own TTS (text-to-speech) model, or even your own voice. Here is a code example that you can immediately execute.
Ok, let’s go line by line.
Create a Client
Let’s first create a new instance of Client
.
Create an Evaluator
Then, you create a new instance of Evaluator
:
Add files
Now, you add every synthesized speech files to the evaluator.
Close
Finally, close the Evaluator
object.
With this, you can evaluate the naturalness of two synthesized human voices via podonos in 5-point MOS scale from real human evaluators. By default, 10 humans will evaluate the naturalness of each audio, their results are analyzed. Once these steps finish, you can check the status in your Workspace.
Was this page helpful?