WCAG 1.2.2: Captions (Prerecorded)

WCAG 1.2.2, titled "Captions (Prerecorded)", is a Level A success criterion that addresses the fundamental need for accessibility in time-based media. It mandates that captions must be provided for all prerecorded audio content within synchronized media, with a specific exception. This criterion is vital for ensuring that deaf and hard-of-hearing individuals, as well as those in various other situations, can access and understand information presented in videos.

Introduction to WCAG 1.2.2 Captions (Prerecorded)

This success criterion requires that if your website includes prerecorded synchronized media (which is media containing both audio and visual components, like a video), then comprehensive captions must be available for all the audio content. "Prerecorded" means the media is not live and can be edited before distribution.

What are captions? Captions are a text alternative for the audio information in synchronized media. They are displayed on-screen, synchronized with the video, and typically include not only spoken dialogue but also crucial non-speech audio information such as sound effects, musical cues, and speaker identification. Unlike subtitles, which primarily translate dialogue for foreign languages, captions aim to provide a complete textual representation of all audio for those who cannot hear it.

Why WCAG 1.2.2 Matters: Accessibility Impact

Providing captions for prerecorded media is not just a best practice; it’s a critical component of web accessibility that significantly broadens the reach and usability of your content. Adhering to WCAG 1.2.2 ensures:

  • Equal Access for Deaf and Hard-of-Hearing Individuals: This is the primary user group benefiting from captions. Without captions, prerecorded audio content is completely inaccessible to them.
  • Enhanced Comprehension for Users with Cognitive Disabilities: Captions can aid in understanding complex information by providing a visual text reinforcement of auditory content.
  • Accessibility in Noisy or Sound-Sensitive Environments: Users in loud public places (e.g., commutes, gyms) or quiet environments (e.g., libraries, offices, late-night viewing) can consume content without needing audio.
  • Language Learners and Non-Native Speakers: Captions can help individuals learning a new language or non-native speakers better understand spoken content and improve vocabulary.
  • Improved Literacy: Reading captions while listening can aid in literacy development and reading comprehension.
  • SEO Benefits: The text within captions (or a transcript generated from them) can be indexed by search engines, potentially improving the visibility and discoverability of your video content.
  • Legal Compliance: Many accessibility laws and regulations worldwide (such as the Americans with Disabilities Act in the U.S. or the European Accessibility Act) mandate the provision of accessible media, including captions, for public-facing content.

Success Criteria and Requirements (Level A)

The full text of WCAG 1.2.2 states: "Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text."

Key Components Explained:

  • "Captions are provided": This means visual text must be available. These are typically "closed captions," meaning users can turn them on or off, and customize their appearance. "Open captions" (burned into the video) are generally discouraged as they cannot be customized or disabled.
  • "for all prerecorded audio content": Every piece of audio information that contributes to the understanding of the content must be captioned. This includes:
    • Spoken dialogue.
    • Speaker identification (e.g., "[Sarah]" or "[Narrator]").
    • Significant non-speech sounds (e.g., "[door slams]", "[ominous music]", "[laughter]", "[phone rings]").
    • Emotion conveyed by sound (e.g., "[whispering]", "[shouting]").
  • "in synchronized media": Synchronized media refers to media that combines audio or video with another format for presenting information (like text or still images) and where the presentation is time-dependent. Examples include videos, animated presentations with sound, and audio-only podcasts with a visual component (e.g., a static image or transcript display). The captions must be accurately timed and synchronized with the corresponding audio and visual events.
  • "except when the media is a media alternative for text": This is a very specific exception. It applies when the synchronized media itself is merely an alternative presentation of text information that is already available on the page in text form. For example, if a page contains a full transcript of a speech, and there’s an embedded video of that exact speech, the video would be a media alternative for text. In such a rare case, captions for that specific video might not be strictly required by this criterion (though often still beneficial). However, if the video contains *any* unique visual or auditory information not fully conveyed by the text alternative, this exception does not apply. When in doubt, provide captions.

Practical Guidelines for Compliance

Achieving compliance with WCAG 1.2.2 involves several steps for content creators and developers:

  1. Generate Accurate Captions:

    • Manual Creation: The most accurate method, though time-consuming.
    • Professional Captioning Services: Companies specialize in creating high-quality, accurate captions.
    • Automated Generation (and review!): Platforms like YouTube offer automated captioning, but these often contain errors and must be thoroughly reviewed and edited for accuracy, speaker identification, and non-speech sounds.
  2. Choose a Suitable Caption Format:

    • WebVTT (.vtt): The recommended format for HTML5 video, providing robust styling and positioning capabilities.
    • SRT (.srt): A widely supported, simpler format.
  3. Embed Captions Correctly:

    • For HTML5 <video> elements, use the <track> element.
    • For embedded third-party video players (e.g., YouTube, Vimeo), ensure their captioning features are enabled and correctly configured. Upload your caption files directly to the platform.
  4. Ensure Caption Quality:

    • Accuracy: Captions must precisely reflect the audio content.
    • Completeness: Include all dialogue, speaker IDs, and relevant non-speech sounds.
    • Synchronization: Captions must appear and disappear in perfect sync with the audio.
    • Readability: Ensure captions are easy to read (sufficient contrast, appropriate font size), displayed for a reasonable duration, and don’t obscure important visual content. Limit lines of text to two or three.

Examples

Correct Implementation: Using HTML5 <video> with <track>

This approach uses the native HTML5 video player capabilities to provide closed captions.

HTML Structure

<video controls width="640" height="360" preload="metadata">
  <source src="path/to/your/video.mp4" type="video/mp4">
  <source src="path/to/your/video.webm" type="video/webm">
  <track kind="captions" src="path/to/your/captions.vtt" srclang="en" label="English" default>
  <p>Your browser does not support HTML5 video. Here is a <a href="path/to/your/video.mp4">link to the video</a> instead.</p>
</video>

Example WebVTT File (captions.vtt)

WEBVTT

1
00:00:00.500 --> 00:00:03.000
[Music playing]

2
00:00:03.500 --> 00:00:07.000
[Narrator] Welcome to our accessibility guide!

3
00:00:07.500 --> 00:00:11.000
[Sound of typing]
Ensuring captions is crucial.

4
00:00:11.500 --> 00:00:14.000
[Voice 1] I completely agree.

5
00:00:14.500 --> 00:00:17.000
[Voice 2, chuckles] It helps everyone.

Explanation: The <track> element specifies the location of the WebVTT file. The kind="captions" attribute indicates its purpose, srclang="en" specifies the language, and label="English" provides a user-facing description. The default attribute makes these captions appear automatically, but users can still toggle them off.

Incorrect Implementations

Scenario 1: No Captions Provided

<video controls width="640" height="360">
  <source src="path/to/your/video.mp4" type="video/mp4">
  <p>Your browser does not support HTML5 video.</p>
</video>

Reason for Failure: This video has audio content but lacks any <track> element for captions, making it inaccessible to deaf or hard-of-hearing users.

Scenario 2: Incomplete Captions

Imagine a video where someone says "Hello!" and then a loud bell rings. The caption file only says:

WEBVTT

1
00:00:01.000 --> 00:00:02.500
Hello!

Reason for Failure: The caption file correctly captures the dialogue but completely omits the significant non-speech audio event ("[Loud bell rings]"), which could be crucial for understanding the video’s context or narrative.

Scenario 3: Poorly Synchronized Captions

A video shows a person speaking, but the text for their dialogue appears several seconds too early or too late, making it difficult or impossible to follow.

Reason for Failure: Even if the text is accurate, incorrect timing renders the captions practically unusable for synchronized media. This fails the "synchronized" aspect of the criterion.

Best Practices and Common Pitfalls

Best Practices

  • Use Closed Captions: Always provide captions that users can turn on and off, and ideally customize (e.g., font size, color, background). This is best achieved with the HTML5 <track> element or through accessible third-party video players.
  • Provide a Transcript: While not a direct requirement of 1.2.2, a full text transcript of the audio (including speaker names and descriptions of important visuals or sounds) can serve as a valuable supplementary alternative, especially for complex or lengthy media.
  • Allow Customization: Where possible, offer options for users to adjust the appearance of captions (e.g., text size, color, background contrast) to suit their individual needs.
  • Test Thoroughly: Always test your captioned videos with actual users, screen readers, and different browsers/devices to ensure functionality and readability.
  • Consider Multiple Languages: If your audience is international, providing captions in multiple languages significantly enhances global accessibility.

Common Pitfalls

  • Relying Solely on Auto-Generated Captions: Automatic captioning services (e.g., YouTube’s auto-captions) are often inaccurate, especially with accents, technical jargon, or poor audio quality. They also frequently miss non-speech elements. Always review and edit.
  • Missing Non-Speech Audio Information: Forgetting to include descriptions of crucial sound effects, music, or environmental sounds.
  • Poor Synchronization: Captions appearing too early, too late, or remaining on screen for too long/short a duration.
  • Incorrect Speaker Identification: Not clearly indicating who is speaking, especially in multi-person dialogues.
  • Using Burned-In (Open) Captions: Captions embedded directly into the video stream are not accessible because users cannot toggle them off, move them, or customize their appearance, potentially obscuring visual information or being difficult to read for some.
  • Inadequate Contrast: Captions that are difficult to read against varying video backgrounds due to poor color contrast.
  • Overlapping Captions: Captions appearing at the same time as other important on-screen text or visuals.

Conclusion

WCAG 1.2.2 Captions (Prerecorded) is a foundational criterion for creating an inclusive web. By diligently providing accurate, complete, and well-synchronized captions for all prerecorded audio content, developers and content creators can significantly improve the accessibility of their media. This commitment not only meets critical accessibility standards but also enhances the user experience for a broad audience, ensuring everyone has equitable access to information.

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.