> ## Documentation Index
> Fetch the complete documentation index at: https://podonos.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# In-Session Quality Control

> Six layers of automated quality control that run during every Podonos evaluation.

## The six pillars

<CardGroup cols={3}>
  <Card title="Acoustic environment" icon="volume-low">
    Measure ambient noise. Auto-detect headphone vs. earphone vs. speaker. No self-report.
  </Card>

  <Card title="Fatigue management" icon="battery-quarter">
    45–60 min session cap. Mandatory mid-session break.
  </Card>

  <Card title="Minimum-listen requirement" icon="play">
    Every query's audio must play to completion before a rating can be submitted.
  </Card>

  <Card title="Attention tests" icon="eye">
    Embedded throughout the session. Pattern of failures triggers automatic rejection.
  </Card>

  <Card title="Reliability scoring" icon="ranking-star">
    Per-evaluator consistency check after the session. Unreliable evaluators are dropped and replaced.
  </Card>

  <Card title="Automatic audio review" icon="file-audio">
    Files screened for playability, corruption, length mismatches, and silent or voiceless content.
  </Card>
</CardGroup>

## Acoustic environment detection

Most platforms ask "do you have headphones?" and "are you in a quiet room?" and trust the answer. We do not.

* **Ambient noise level.** We measure the background noise level through the device and reject sessions above a threshold. Quiet rooms pass; cafés do not.
* **Headphone vs. earphone vs. speaker detection.** Stereo and binaural cues are played back and the response pattern is analyzed. Speakers leak crosstalk in ways headphones do not — we detect that automatically.
* **Continuous monitoring.** The check runs at session start and re-runs throughout. If conditions degrade mid-session, we flag it.

<Tip>
  This is one of the most expensive defenses we run, and it eliminates a class of bias that destroys MOS-style evaluations: a listener on cheap speakers in a noisy room cannot reliably distinguish a high-fidelity render from a low-fidelity one.
</Tip>

## Fatigue management

Listening attentively is exhausting. Our internal experiments confirm what the literature reports: evaluator accuracy degrades sharply past about an hour of continuous evaluation work.

<Steps>
  <Step title="Hard cap per session">
    No evaluator works longer than 60 minutes in a single Podonos session. Most sessions land between 45 and 60 minutes.
  </Step>

  <Step title="Smart splitting">
    If your evaluation is too large for one session, our assignment algorithm splits it into subsessions automatically — sized to the audio length and query count — and recruits more evaluators to cover the work.
  </Step>

  <Step title="Mandatory mid-session break">
    Partway through a session, evaluators are required to take a short break. We play a calming nature video with ambient sound. They cannot skip it. This reliably restores attention for the second half.
  </Step>

  <Step title="One session per evaluator per evaluation">
    An evaluator participates in exactly one session per evaluation. They cannot return later for "round two" and accumulate fatigue or memory effects.
  </Step>
</Steps>

## Minimum-listen requirement

The rating UI enforces a minimum-listen rule on every query: a query cannot be submitted until all of its audio has been played to completion. This is the most basic engagement defense in the platform — it eliminates click-through-without-listening at the mechanical level.

## Attention tests

Embedded throughout each session are attention checks designed to look like normal queries. They verify the evaluator is actually listening, has the headphones on, and is reading instructions before clicking.

* **Distribution.** Sprinkled throughout, not bunched at the start.
* **Threshold.** A single missed test is not a rejection — the threshold is calibrated against the base rate of legitimate confusion. A pattern of misses is rejection.
* **Clean output.** Ratings from a rejected evaluator are stripped from the final aggregation.

## Post-evaluation reliability loop

When a session ends, the data is not yet final. Podonos runs a post-evaluation reliability loop before any number reaches your report.

### What we compute

For every evaluator in the cohort, we score:

* **Inter-evaluator agreement** on the same queries.
* **Consistency** on repeated and near-duplicate items embedded in the session.
* **Variance pattern** across the session — sudden flips suggest a tired or distracted evaluator.

These signals combine into a per-evaluator reliability coefficient, and the cohort itself produces an aggregate reliability bar that your evaluation must clear.

### The loop

<Steps>
  <Step title="Compute">
    Score every evaluator's reliability and the cohort-level reliability coefficient.
  </Step>

  <Step title="Drop">
    Remove all data from evaluators whose reliability falls below the threshold. Their votes never reach your final report.
  </Step>

  <Step title="Backfill">
    Recruit fresh evaluators to replace the dropped slots so your votes-per-query target is restored.
  </Step>

  <Step title="Repeat">
    Re-compute reliability with the new cohort. If the bar is not yet met, drop and backfill again.
  </Step>

  <Step title="Release">
    The loop terminates only when the reliability bar is met. Final aggregated numbers are computed exclusively from evaluators who passed.
  </Step>
</Steps>

<Note>
  When the cohort reliability falls below the bar, we add 20% to 40% more evaluators in each backfill round and re-run the loop. The exact top-up depends on how far the cohort sits from the bar — a small gap needs a small top-up, a larger gap needs more. Every vote in your final aggregation comes from an evaluator who passed the reliability gate, in a cohort whose aggregate reliability cleared the bar.
</Note>

## Automatic audio review

Before evaluators ever see your files, every uploaded audio runs through a pipeline of automated checks:

<AccordionGroup>
  <Accordion title="Playability check" icon="circle-play">
    Does the file decode cleanly? Are the headers valid? Is the codec supported by the playback device profile we ship?
  </Accordion>

  <Accordion title="Corruption detection" icon="triangle-exclamation">
    We scan for truncated streams, header/payload mismatches, and zero-byte regions that indicate a broken upload or render.
  </Accordion>

  <Accordion title="Metadata vs. actual length" icon="ruler">
    We compare the duration declared in the file metadata against the actual decoded length. Mismatches frequently indicate truncation, codec issues, or generation failures upstream.
  </Accordion>

  <Accordion title="Silent or voiceless content" icon="volume-xmark">
    We detect files that contain no audio at all, no speech (only background or instrumental), or unusually long silence — common failure modes for TTS pipelines.
  </Accordion>
</AccordionGroup>

<Warning>
  Files that fail audio review are surfaced back to you in the Workspace before evaluation begins, so a broken render does not waste evaluator budget.
</Warning>
