About the Project

Click here for full proposal in .pdf form.


This project has ended, as of December 31, 2017. To read the full report on project outcomes, as well as access links to code and other software outputs, along with white papers, please see the Outcomes page.

The final project report, in Mellon Foundation format, is available here.

What follows is a legacy introduction to the project.


Principal investigator of this project is Timothy Cole, Mathematics Librarian & Professor of Library and Information Science, Center for Informatics Research in Science and Scholarship (CIRSS), at the School of Information Sciences, University of Illinois at Urbana-Champaign. Co-PIs are Myung-Ja “MJ” Han, Metadata Librarian and Associate Professor, University Library, University of Illinois at Urbana-Champaign, and Caroline Szylowicz, Curator of Rare Books and Manuscripts, Kolb-Proust Librarian and Associate Professor, University Library, University of Illinois at Urbana-Champaign.


Tangible special collections of primary sources have long been central to humanities research. At times there is still no substitute for physical access to a primary source, but scholar interest in digital resources is growing. Today digitized special collections play a major role in humanities scholarship and pedagogy. Digital collections facilitate the initial exploration, discovery and disambiguation of sources. Well-connected digital collections can help satisfy the need for contextual mass, enable complex connective research, and provide a powerful way to collate and contextualize physically dispersed primary sources. Given the core mission of libraries to facilitate the discovery and use of resources that support scholarship, high priority has been given in the last 20 years to the digitization of special collections. A question naturally follows: After digitization, what more needs to be done to maximize the usefulness of these digitized resources?

The relatively modest levels of use that many digitized special collections get and the low share of this use attributable to faculty and students suggest that more does need to be done post- digitization. There are multiple factors, of course, but in large part the full potential utility of digitized special collections has not yet been realized because digitized special collection resources, though accessible via the World Wide Web, are not woven into the fabric of the Web, and especially are not integrated much at all into the emerging and increasingly important data-centric subset of the Web known as the Semantic Web. Digitized special collections are on the Web, but not part of the Web, at least not to the degree that they could be.

Transforming legacy special collections item-level metadata into Linked Open Data (LOD) and integrating LOD into services and end-user interfaces will help address this problem. This is not a new or unique insight, but within the library community the paradigm shift to LOD is proving difficult, both technically and socially. Library experience with LOD, especially LOD for special collections, is limited. Best practices for transforming legacy metadata into LOD are still developing, and the hypothesized benefits of LOD for our users remain to be demonstrated. As a result libraries are hesitant to take on this task without outside assistance. Incentivizing the transition to LOD for digitized special collections is especially challenging given the diversity of descriptive practices and sophisticated user requirements in this domain. Further experimentation and proofs-of-concept are needed to establish the costs of transforming legacy special collections metadata into LOD and to demonstrate the near-term benefits of doing so.

Research Questions

This 20-month project, conducted collaboratively by the University Library and the Graduate School of Library and Information Science at the University of Illinois at Urbana- Champaign, aims to further our understanding of four translational research questions:

  1. As compared to general collection catalog records, item-level metadata for digitized special collections are frequently more granular, richer in non-bibliographic entities, and expressed using custom vocabularies and schemas. What differences and additional challenges are encountered when transforming legacy special collections metadata records into LOD?
  2. Typically interfaces used to discover and view digitized special collections are disconnected from the online public access catalogs and ancillary services used to provide user access to general library collections. Can LOD reconnect library special and general collections?
  3. Digitized special collections are also disconnected from external, non-library information resources on the Web. How can LOD be leveraged to help identify and establish useful connections to these resources, and do non-library sources have the potential to enrich item descriptions and provide context for discovering and interpreting digitized special collections?
  4. Often descriptions of special collection items include extensive references to people and relationships. Can emerging visualization and annotation technologies add a social network view of a special collection that usefully complements traditional bibliocentric perspectives?


We propose to investigate these four questions and demonstrate findings concretely by transforming legacy string-based item-level metadata and then experimenting with user services for three modestly sized digitized special collections hosted by the University of Illinois–the Motley Collection of Costume and Theatre Design, the Portraits of Actors, 1720-1920 Collection, and the Kolb-Proust Archive for Research. The first two collections are typical of theatre-themed image special collections hosted in CONTENTdm or similar content management systems. While loosely based on Dublin Core (DC), the metadata schemas used for these digitized collections have been customized and extended to express attributes and types germane to such image collections. The Proust Archive metadata, on the other hand, are expressed using a profile of the Text Encoding Initiative (TEI) schema and provide context for Proust’s letters, literary works and relationships. The metadata for all three collections are rich in person, place and event entities, but these contrasts in descriptive model and collection content will allow us to highlight findings that have applicability beyond a single metadata schema or collection type. Additionally, working with three collections will help us differentiate between collection-specific and generic remediation and transformation requirements. Finally, because the Proust Archive metadata are especially rich in information about Proust’s social relationships, they will provide good fodder for question 4 above.

Our goal is to provide evidence helpful to understanding these research questions and gain experience with these issues, demonstrate potential benefits of LOD, and learn more about the resources required to transform and utilize LOD, both as a way to inform transformation best practices and as a means to add to a collective assessment of the likely benefits of LOD for library users. In undertaking our work we will take advantage of related past and ongoing research into the use of LOD across all kinds of library collections. This includes our own experience with Emblematica Online and in transforming a MARC-based snapshot of our library catalog into LOD, work that has been done by OCLC Research, the efforts of the World Wide Web Consortium (W3C) schema.org Community Group, and the research being conducted by the Linked Data for Libraries (LD4L) and the proposed Linked Data for Production cataloging (LD4P) projects.