Documentation Index
Fetch the complete documentation index at: https://podonos.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The six pillars
Acoustic environment
Measure ambient noise. Auto-detect headphone vs. earphone vs. speaker. No self-report.
Fatigue management
45–60 min session cap. Mandatory mid-session break.
Minimum-listen requirement
Every query’s audio must play to completion before a rating can be submitted.
Attention tests
Embedded throughout the session. Pattern of failures triggers automatic rejection.
Reliability scoring
Per-evaluator consistency check after the session. Unreliable evaluators are dropped and replaced.
Automatic audio review
Files screened for playability, corruption, length mismatches, and silent or voiceless content.
Acoustic environment detection
Most platforms ask “do you have headphones?” and “are you in a quiet room?” and trust the answer. We do not.- Ambient noise level. We measure the background noise level through the device and reject sessions above a threshold. Quiet rooms pass; cafés do not.
- Headphone vs. earphone vs. speaker detection. Stereo and binaural cues are played back and the response pattern is analyzed. Speakers leak crosstalk in ways headphones do not — we detect that automatically.
- Continuous monitoring. The check runs at session start and re-runs throughout. If conditions degrade mid-session, we flag it.
Fatigue management
Listening attentively is exhausting. Our internal experiments confirm what the literature reports: evaluator accuracy degrades sharply past about an hour of continuous evaluation work.Hard cap per session
No evaluator works longer than 60 minutes in a single Podonos session. Most sessions land between 45 and 60 minutes.
Smart splitting
If your evaluation is too large for one session, our assignment algorithm splits it into subsessions automatically — sized to the audio length and query count — and recruits more evaluators to cover the work.
Mandatory mid-session break
Partway through a session, evaluators are required to take a short break. We play a calming nature video with ambient sound. They cannot skip it. This reliably restores attention for the second half.
Minimum-listen requirement
The rating UI enforces a minimum-listen rule on every query: a query cannot be submitted until all of its audio has been played to completion. This is the most basic engagement defense in the platform — it eliminates click-through-without-listening at the mechanical level.Attention tests
Embedded throughout each session are attention checks designed to look like normal queries. They verify the evaluator is actually listening, has the headphones on, and is reading instructions before clicking.- Distribution. Sprinkled throughout, not bunched at the start.
- Threshold. A single missed test is not a rejection — the threshold is calibrated against the base rate of legitimate confusion. A pattern of misses is rejection.
- Clean output. Ratings from a rejected evaluator are stripped from the final aggregation.
Post-evaluation reliability loop
When a session ends, the data is not yet final. Podonos runs a post-evaluation reliability loop before any number reaches your report.What we compute
For every evaluator in the cohort, we score:- Inter-evaluator agreement on the same queries.
- Consistency on repeated and near-duplicate items embedded in the session.
- Variance pattern across the session — sudden flips suggest a tired or distracted evaluator.
The loop
Drop
Remove all data from evaluators whose reliability falls below the threshold. Their votes never reach your final report.
Backfill
Recruit fresh evaluators to replace the dropped slots so your votes-per-query target is restored.
Repeat
Re-compute reliability with the new cohort. If the bar is not yet met, drop and backfill again.
When the cohort reliability falls below the bar, we add 20% to 40% more evaluators in each backfill round and re-run the loop. The exact top-up depends on how far the cohort sits from the bar — a small gap needs a small top-up, a larger gap needs more. Every vote in your final aggregation comes from an evaluator who passed the reliability gate, in a cohort whose aggregate reliability cleared the bar.
Automatic audio review
Before evaluators ever see your files, every uploaded audio runs through a pipeline of automated checks:Playability check
Playability check
Does the file decode cleanly? Are the headers valid? Is the codec supported by the playback device profile we ship?
Corruption detection
Corruption detection
We scan for truncated streams, header/payload mismatches, and zero-byte regions that indicate a broken upload or render.
Metadata vs. actual length
Metadata vs. actual length
We compare the duration declared in the file metadata against the actual decoded length. Mismatches frequently indicate truncation, codec issues, or generation failures upstream.
Silent or voiceless content
Silent or voiceless content
We detect files that contain no audio at all, no speech (only background or instrumental), or unusually long silence — common failure modes for TTS pipelines.

