Project will help researchers explore big data in HathiTrust digitized library

Illinois professor Ted Underwood wants to know how the language describing male and female characters in works of fiction has changed since the late 18th century. He’s using data-mining tools to gather information from thousands of books to answer that question.

The problem, though, is that books published after 1922 are still under copyright protection and their content can’t be shared freely online.

L-R: Ted Underwood, J. Stephen Downie & Timothy Cole

“There are hundreds of thousands of books out there, and we don’t talk about them,” Underwood said. “That is a dark landscape after the wall of copyright comes down. We can read the books one by one, but we can’t make generalizing claims at all.”

The HathiTrust Research Center is leading the Mellon-funded project to provide greater access to the digitized HathiTrust library.

A project of the HathiTrust Research Center – a collaboration between the University of Illinois and Indiana University – aims to get around that problem and allow scholars to analyze large numbers of books while still respecting copyright laws. The project is being funded by a two-year, $1.17 million grant from the Andrew W. Mellon Foundation.

Read the full article at the University of Illinois News Bureau here.