Optical Character Recognition (OCR)

The Standards

WCAG 2.0 Guidelines:

  • Guideline 1.4.5 a “If the technologies being used can achieve the visual presentation, text is used to convey information rather than images of text except for the following: (Level AA)
    • Customizable: The image of text can be visually customized to the user’s requirements;
    • Essential: A particular presentation of text is essential to the information being conveyed.” (W3C)

What do the Standards Mean?

OCR is a process of converting scanned images to recognizable printed text by a computer or other electronic devices.

What is scanning?

Most offices personal or professional are commonly using All-in-one printers. These printers can print, scan, and fax documents. The process of converting a printed document to electronic can be achieved by scanning. This scanned product, however, may or may not be accessible. Inaccessible scanned documents are images that do not have the ability to be recognized as text. So, what visually appears on the screen after the scan is “readable”, but if used in combination with assistive technology is only “seen” as an image with no text.

What is OCR?

The process of converting the scanned image to text that is readable by the computer or other assistive software is called OCR.

When an image is scanned it has the option to be saved as a PDF (Adobe reader compatible file). PDFs formats are commonly used to provide and disseminate documents in a learning environment. Saving a scanned image as a PDF does not necessarily mean that it is accessible.

There are several ways to check if your scanned PDF is an accessible file.

  • Using the mouse, click in the document. If the entire text gets selected and it looks like one big image, it probably is. This format is inaccessible.
  • Using your keyboard, execute the find command (i.e. Control + F for PC and Command + F for Mac). If you are unable to find text within the document the format is inaccessible.
  • Within Adobe Reader (free software), go to the “view” menu, and “Read Out Loud”, “Activate Read Out Loud” or press Control + Shift + Y to activate read out loud. Now “Read this page only” with Control + Shift + V. If the reader starts to read the text the document is accessible. If not, it may say blank page.

Scanned, inaccessible PDF images can be converted to accessible PDF by executing the process of OCR. There are many off-the-shelf software available to complete this process. Operating systems are also providing built-in OCRs to help with the process. The All-in-One printer may also have software that performs OCR.