About the project – Data Mining with Limited Access Text: National Forum

Motivation

With the growth of digital scholarly publishing, online repositories of digitized texts, and increased interest in data sharing and re-use, text data mining and analysis has clearly emerged as a viable research method for scholars in an increasing number of subject domains.

And while open source text data mining tools such as Voyant and publicly-available services such as the HTRC have brought the potential of new research discoveries through computational analytics within reach of scholars, the texts themselves are frequently protected by copyright or other IP rights, or subject to license agreements that limit access and use. These IP and licensing considerations can complicate a researcher’s efforts to access the dataset, incorporate it into analytical research, and communicate the output and related methods to a broader audience. Increasingly, academic libraries are engaging with content providers to facilitate access to text datasets for researchers.

This national forum will shed light on the multidimensional aspects of this issue and promote a sufficiently broad and deep perspective on a challenge where stakeholders include libraries, content providers (e.g., commercial publishers, government agencies, corporations); scholars across multiple disciplines; policy makers and legal experts; software developers; and directors of data repositories, registries and data journals.

Project team

This effort is led by PI Bertram Ludäscher (University of Illinois) and Co-PIs Beth Sandore Namachchivaya (University of Waterloo) and Megan Senseney (University of Illinois), along with Investigator Eleanor Dickson (University of Illinois).