Quality
Evaluate the overall quality of your speech/audio.
Intro
One of the popular questions in speech/audio evaluation is the quality of the generated output. It is not directly about naturalness or intelligbility. Quality is connected with many aspects including all those mentioned above.
One of the widely used quality evaluation methods for speech/audio is mean opinion score (MOS). Its scale typically ranges from 1 (lowest quality) to 5 (highest quality like human) with 1 granularity (which is called five-point Likert Scale). Through podonos, you will evaluate the overall quality of your speech/audio in a fully managed service.
Quality Mean Opinion Score Measurement
As one way of quality measurement, we demonstrate a quality measurement of synthesized human voice with additional noise. Below is an executable code example:
With this, you can evaluate the overall quality of the original and the one with additive noise.
Indepth Quality Measurement
Another way of evaluating the speech quality is to follow ITU-T P.808 recommendation. It recommends 1) how to qualify the evaluators, 2) how to train them, and 3) how to collect the evaluation results and analyze them. It is a demanding process if you setup a system and run the evaluation. With Podonos, you can easily set up the whole evaluation with a few lines of code.
Example
In this example, let’s assume you are developing a new speech enhancement algorithm, called MNSE (My New Speech Enhancement). We will use mnse as the name of your package.
Here is a code example that you can immediately execute.
Behind the scene
In addition to selecting a proper group of evaluators, ITU-T P.808 requires additional steps to ensure the evaluation environment is relevant and they conduct each session in an appropriate manner.
Evaluator qualitification
Following ITU-T P.808, we qualify the evaluators by reviewing hearing device, mother tongue, age, gender distribution, hearing capability, and geographics. For those who are disqualified, they are forced to stop the evaluation session.
Evaluation with gold references
While your audio files are evaluated, we automatically inject the gold references (so called anchor or hidden questions) that we know the correct responses. If the evaluators respond incorrectly, their evaluation results are automatically rejected afterwards.
Reliability evaluation
Once the evaluation session is done, we automatically compute the overall evaluation reliability. Such evaluations that are significantly unreliable compared to other evaluations are marked and excluded.
Was this page helpful?