Stephen V. Rice, George Nagy, and Thomas A. Nartaker’s work on OCR, though written in 1999, is still a remarkably valuable bedrock text for diving into the technology. Though OCR systems have, and continue to, evolve with each passing day, the study presented within their book still highlights some of the major issues one faces when performing optical character recognition. Text is in an unusual typeface or contains stray marks, print is too heavy or too light. This text gives those interested in learning the general problems that arise in OCR a great guide to what they and their patrons might encounter.
The book opens with a quote from C-3PO, and a discussion of how our collective sci-fi imagination believe technology will have “cognitive and linguistic abilities” that match and perhaps even exceed our own (Rice et al., 1999, p. 1).
The human eye is the most powerful character identifier to exist. As the authors note “A seven year old child can identify characters with far greater accuracy than the leading OCR systems” (Rice et al., 1999, 165). I found this simple explanation so helpful for when I get questions here in the Scholarly Commons from patron who are confused as to why their document, even after been run through and OCR software, is not perfectly recognized. It is very easy, with our human eyes, to discern when a mark on a page is nothing of importance, and when it is a letter. Ninety-nine percent character accuracy doesn’t mean ninety-nine percent page accuracy.
In summary, this work presents a great starting point for those with an interest in understanding OCR technology, even at almost two decades old.
Give it, and the many other fabulous books in our reference collection, a read!