WCAG 1.4.7: Low or No Background Audio
WCAG Success Criterion 1.4.7: Low or No Background Audio (Level AA)
WCAG Success Criterion 1.4.7, classified as a Level AA criterion, addresses the critical need for clear and understandable speech in prerecorded audio content. It ensures that background sounds do not obscure or make it difficult to discern foreground speech, thereby enhancing accessibility for a wide range of users.
What is Success Criterion 1.4.7?
This criterion states that for all prerecorded audio content that contains speech, the background sounds must be at least 20 decibels lower than the foreground speech content. This 20 dB difference is a significant reduction, equivalent to approximately one-quarter of the perceived loudness of the speech content, making the speech much more prominent.
It’s important to note that this requirement applies specifically to prerecorded audio, not live audio, and there are specific exceptions where this rule does not apply, which we will detail below.
Why does this criterion matter? (Accessibility Impact)
Clarity of speech is fundamental for effective communication, and background audio can significantly impede this. This criterion is crucial for several user groups:
- Users with Hearing Impairments: Even with hearing aids or cochlear implants, differentiating speech from background noise can be extremely challenging. Reducing background audio allows these users to better focus on and understand the spoken content.
- Users with Cognitive Disabilities: Individuals with cognitive or learning disabilities (e.g., ADHD, autism spectrum disorder) may find it difficult to filter out extraneous sounds and concentrate on the primary audio source. Excessive background noise can lead to sensory overload, distraction, or an inability to process information.
- Users in Noisy Environments: Anyone listening to content in a less-than-ideal environment (e.g., public transport, an open-plan office) benefits when speech is clearly distinguishable, even without a disability.
- Users with Limited Language Proficiency: For those listening in a non-native language, clearer speech can aid comprehension significantly.
By minimizing background audio, websites and applications create a more inclusive and less fatiguing listening experience for everyone.
Understanding the Requirements: The 20 dB Rule
The core of SC 1.4.7 is the requirement that background sounds be at least 20 decibels lower than the speech content. This measurement is typically based on the Root Mean Square (RMS) level of the audio. RMS provides a good approximation of the perceived loudness of a sound over time.
- Prerecorded Audio: This criterion applies to any audio that is not live, such as podcasts, video narratives, documentaries, advertisements, and e-learning modules.
- Speech Content: The rule applies whenever there is human speech intended to convey information.
- 20 Decibels (dB): This is a logarithmic unit. A 20 dB reduction means the background sound’s power is 100 times less than the speech’s power, or its amplitude is 10 times less. Practically, this means the background sound should be quite subtle and noticeable only when speech is absent, or barely perceptible when speech is present.
Exceptions to the 20 dB Rule
There are specific situations where the 20 dB requirement does not apply:
-
The background sounds are for a sound logo or jingle that is at most two seconds long.
Explanation: Short, recognizable audio cues like brand jingles or stingers are generally brief enough not to interfere with extended speech and are often essential for brand identity.
-
The background sounds are part of non-speech content (e.g., a music performance, a dramatic reading, an audio documentary where sounds are used to create ambiance, or a video with sound effects) and are not intended to be speech.
Explanation: This exception covers content where the background sound is the primary content or an essential artistic element, and speech is either not present or is secondary to the sound itself. For example, a music video, an instrumental track, or a nature documentary where the sounds of the environment are the focus. If, however, there is speech in such content (e.g., a narrator in a nature documentary), then the speech-to-background ratio still applies to that narration.
-
The background sounds are incidental, meaning they are a natural part of the environment and are not intentionally added or enhanced to create a mood or atmosphere.
Explanation: This covers unavoidable background noise that occurs during recording, such as traffic sounds during an outdoor interview, or subtle room tone. These are not intentionally mixed in for artistic effect. The key is that they are not amplified or made more prominent than they naturally were.
-
The background sounds are for an audio CAPTCHA.
Explanation: Audio CAPTCHAs often deliberately obscure or distort speech to prevent automated bots from solving them. The primary purpose here is security, not clear communication.
Practical Guidelines for Compliance
Achieving compliance with SC 1.4.7 requires attention during content creation, audio editing, and final production.
For Content Creators and Speakers:
- Record in Quiet Environments: Minimize ambient noise during recording sessions. Use high-quality microphones that focus on the speaker’s voice.
- Consistent Volume: Encourage speakers to maintain a relatively consistent speaking volume to make audio leveling easier in post-production.
- Scripting: If background music or sound effects are planned, consider where speech will occur and how the background audio will be managed during those segments.
For Audio Editors and Engineers:
- Use Digital Audio Workstations (DAWs): Software like Audacity, Adobe Audition, Logic Pro, Pro Tools, or Reaper provide precise control over audio levels.
- Volume Automation/Ducking: Apply volume automation (envelopes) or side-chain compression (ducking) to automatically lower background music or effects whenever speech is present.
- Measure RMS Levels: Utilize metering tools within your DAW to measure the RMS levels of both speech and background audio. Aim for the background to consistently be at least 20 dB lower than the speech’s RMS level.
- Equalization (EQ): Use EQ to carve out frequencies where speech resides, further helping speech to stand out from background elements.
- Monitor with Different Devices: Listen to the final mix on various devices (headphones, laptop speakers, phone speakers) and in different environments to catch potential issues.
For Developers:
- No Overrides: Ensure that your media players or web interfaces do not inadvertently override the carefully mixed audio levels.
- Player Controls: While not strictly a 1.4.7 requirement, consider providing user controls to adjust audio tracks separately (e.g., a slider for music and one for voice), though this is usually for more complex media.
Examples of Correct and Incorrect Implementations
Correct Implementation:
A podcast episode features a host speaking over a light background music track. When the host begins to speak, the music gently and noticeably reduces in volume to a level that is at least 20 dB below the host’s voice. When the host pauses, the music slightly increases, but never to a point where it would compete with speech.
Audio Editing Logic (Conceptual):
// Using a pseudo-code for an audio editor's logic
function processAudio(speechTrack, musicTrack) {
const SPEECH_THRESHOLD_DB = -20; // Example threshold for speech detection
const MUSIC_DUCKING_DB = 20; // Target reduction for music when speech is present
const FADE_TIME_MS = 200; // Smooth fade in/out time
for (let i = 0; i < speechTrack.length; i++) {
// Check if speech is detected at current point (simplified)
if (getRMS(speechTrack.slice(i, i + FADE_TIME_MS)) > SPEECH_THRESHOLD_DB) {
// Speech is present, duck the music
setVolume(musicTrack, i, getRMS(speechTrack) - MUSIC_DUCKING_DB, FADE_TIME_MS);
} else {
// No speech, restore music to original level
setVolume(musicTrack, i, originalMusicVolume, FADE_TIME_MS);
}
}
return { mixedAudio: combineTracks(speechTrack, musicTrack) };
}
In a real Digital Audio Workstation (DAW), this is achieved using features like volume automation curves or sidechain compression, where the presence of speech ‘ducks’ the volume of the music track.
Incorrect Implementation:
An explainer video features a narrator, but the background ambient music is nearly as loud as the narration. Listeners struggle to understand the spoken content, especially those using small device speakers or in moderately noisy environments.
Common Mixing Error:
// Incorrect mixing where music volume is too high relative to speech
function mixAudioPoorly(speechTrack, musicTrack) {
// Both tracks are normalized to similar peak levels, but RMS difference is small
const speechVolume = 0.8; // Example relative volume
const musicVolume = 0.6; // Music is only slightly quieter, not 20dB lower RMS
// This will result in music competing with speech.
return {
mixedAudio: combineTracks(
applyVolume(speechTrack, speechVolume),
applyVolume(musicTrack, musicVolume)
)
};
}
This often happens when producers prioritize the mood set by the music over the clarity of the speech, or are unaware of the specific 20 dB requirement.
Best Practices and Common Pitfalls
Best Practices:
- Prioritize Speech: Always ensure the primary purpose of audio with speech is for the speech to be clearly understood.
- Consistent Application: Apply the 20 dB rule consistently throughout all relevant prerecorded audio content.
- User Testing: Conduct user testing with individuals who have hearing or cognitive impairments, or simply ask colleagues to listen in various environments (e.g., quiet room, headphones, open-plan office).
- Provide Alternatives: For complex audio content, consider offering an alternative version with no background audio, or a transcript (SC 1.2.1 Audio-only and Video-only (Prerecorded), SC 1.2.2 Captions (Prerecorded)).
- Education: Educate content creators and audio editors on the importance and specifics of this guideline.
Common Pitfalls:
- Ignoring the Rule for Artistic Reasons: While artistic expression is important, accessibility is a legal and ethical requirement.
- Misinterpreting Exceptions: Confusing “non-speech content” (like a music track) with content that *has* speech over music. If speech is present, the rule generally applies.
- Not Measuring: Relying solely on subjective listening instead of objective measurement tools (RMS meters) in audio software.
- Inconsistent Mixing: Background audio levels fluctuating too much, making it sometimes compliant and sometimes not.
How to Test for Compliance
Testing for SC 1.4.7 compliance involves both objective measurement and subjective listening:
- Audio Metering Software: Use a Digital Audio Workstation (DAW) or a dedicated audio analysis tool to measure the RMS levels of the speech track and the background audio track separately, then calculate the difference.
- Listening Tests: Listen to the audio with various types of headphones and speakers. Try listening in a quiet room, and then in a moderately noisy environment to simulate real-world conditions. Can you easily understand every word of the speech?
- Focus Group/User Testing: Involve individuals from the target user groups (e.g., those with hearing impairments) in testing to gather direct feedback.
Related WCAG Guidelines
SC 1.4.7 works in conjunction with several other WCAG criteria related to audio and media:
- 1.2.1 Audio-only and Video-only (Prerecorded) (Level A): Requires a transcript or audio description for prerecorded audio-only content.
- 1.2.2 Captions (Prerecorded) (Level A): Requires captions for all prerecorded audio content in synchronized media.
- 1.2.3 Audio Description or Media Alternative (Prerecorded) (Level A): Requires audio description or a full text alternative for prerecorded video-only content.
- 1.2.5 Audio Description (Prerecorded) (Level AA): Requires audio description for all prerecorded video content in synchronized media.
- 1.2.6 Sign Language (Prerecorded) (Level AAA): Requires sign language interpretation for prerecorded audio content in synchronized media.