The Standards

Section 508 Standard:

  • Standard 1194.24, c “All training and informational video and multimedia productions which support the agency’s mission, regardless of format, that contain speech or other audio information necessary for the comprehension of the content, shall be open or closed captioned.” (

WCAG 2.0 Guideline:

  • Guideline 1.2.2 “Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)” (W3C)
  • Guideline 1.2.4 “Captions (Live): Captions are provided for all live audio content in synchronized media. (Level AA)” (W3C)

What do the standards mean?

The ultimate goal is to maximize the number of people who can fully acquire and appreciate the information conveyed in the resource. This is done by presenting information to more than one of the senses; for example, design a resource so that users have the option to get the information via sound, sight or both. If the only way to get the information is to hear the audio then the resource is not accessible. If the only way to get the information is to see the illustrations, text and other visuals then the resource is not accessible. Any information presented visually should also be audible and vice versa.

Best Practices for Captions

When captioning a video, it is recommended the following guidelines.


Captions must be as closely synchronized with the audio as possible. This means the caption should appear on the screen at the exact same time as the equivalent audio is playing. There should not be any notable delay.


Make sure the captions are free of non-purposeful spelling and grammar errors and have proper punctuation. Automatic, machine-generated captions are usually not 100% accurate. Be aware of this when using websites like YouTube. The only exception to this guideline is when the error in the captions is how it is presented in the audio and the error is important to the meaning that is being conveyed in the video.

On this same note, the captions should not phonetically adhere to a person’s accent because that could make it difficult for people to understand the captions. If it is necessary to note the accent of the speaker then add that information in brackets or parenthesis. Whether to include ‘ums’ and ‘ahs’ and other disfluencies depends on the type of video. Legal videos might require a strict transcription but then other types of educational videos might not. Research this prior to captioning.

Important non-speech audio

Include non-speech audio in brackets or parenthesis when that information is needed to fully understand the video. For example, it may be neccessary to include sound effects such as, “fire alarm”, “baby crying”, “music” or “car horn honking.” Some captioning tools provide icons that symbolize certain non-speech audio; for example, there may be the option to add music note icons before and after music lyrics in captions.

Names and titles of the speakers

When a new person begins speaking in a video, add the name of the speaker and that speaker’s title (if available) in the caption of their first line of dialogue. There are a couple different formats.

  • >> Name of Speaker
  • Name of Speaker:
  • [Name of speaker]

As long as it is the same person speaking, it is unnecessary to put their name and title on every line. Also, it is best practice to avoid having dialogue from multiple speakers on the screen at the same time unless they are speaking simultaneously.

Consistent style and format

Try to be consistent with the format and style of the captions throughout the video. Some examples of consistency are:

  • Use the same typeface, color, and font size for every line of captioning.
  • Have the captions in the same location on the screen throughout the video.
  • Use the same symbols for non-speech audio throughout the video.
  • Use the same format for indicating a new speaker throughout the video.

Location and appearance

Most websites and software place the captions at the bottom center of the video screen. The only time to put them elsewhere is if the captions conceal important visual information. In that case, adjust the location. Only include one to three lines on the screen at a time. Make sure the captions are on the screen long enough that people of varying reading speeds can read them.

Make sure the text is as readable as possible by putting the text in a common sans serif typeface and in a color that does not blend into the background of the video.The text color and the background color should have a high contrast.

Real-time Captioning

For most web content the captioning process is done post-production. In other words, the media is transcribed and captioned after it is recorded and published. However, it is becoming more and more common to caption during a live video web event; for example, this is done with web conferencing and video streaming. This allows individuals who need captions to participate in live events. Usually, paid services are used for real-time captioning.