Intro
One of the popular measuress in the synthesized speech is the naturalness: measuring how natural the synthesized speech is. One of the most popular naturalness evaluation methods for speech/audio is mean opinion score (MOS). Its scale typically ranges from 1 (lowest naturalness like old robot) to 5 (highest naturalness like human) with 1 granularity (which is called five-point Likert Scale). Through podonos, you will evaluate the naturalness of your speech/audio in a fully managed way.
Example
Our first example uses AWS Polly to generate synthesized human voice and uses podonos for evaluation. Of course, you can use your own TTS (text-to-speech) model, or even your own voice. Here is a code example that you can immediately execute.python
With this, you can evaluate the naturalness of two synthesized human voices via
podonos in 5-point MOS scale from real human evaluators.
By default, 10 humans will evaluate the naturalness of each audio, their results are analyzed.
Once these steps finish, you can check the status in your Workspace.

