Editing Ai Audio Speech (section)

== <span style="color: #FFFFFF;">Evaluating</span> ==
ASR evaluation:
# '''WER on diverse benchmarks''': LibriSpeech (read speech), Switchboard (telephone conversational), CHiME-4 (noisy), AMI (meeting), and target domain.
# '''Latency''': Real-time factor (RTF): if RTF=0.1, model processes 10 seconds of audio per second of compute — production systems typically require RTF<1.
# '''Fairness''': WER disaggregated by accent, dialect, gender, and age — ASR systems frequently exhibit higher WER for underrepresented groups. TTS evaluation: MOS (Mean Opinion Score) from human listeners; naturalness and intelligibility scores separately.
</div>

<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">