Harriett Green, the English and Digital Humanities Librarian, is starting a new project entitled, “Bandits and Browsing: Data Mining and Network Analysis for Library Collections,” a collaboration between the University of Illinois Urbana-Champaign Library and the Institute for Computing in Humanities, Arts and Social Science (I-CHASS). For the project, she was awarded a start-up allocation grant from XSEDE (Extreme Science and Engineering Discovery Environment).
Green, who is the principal investigator on the project, explains “we anticipate that these analyses will enable the initial development of a recommender system for library catalogs and digital libraries that will present the fullest possible breadth of relevant items and content in users’ search results. And ultimately, we hope that this will lead to a platform that will enable librarians, information scientists, and researchers to launch in-depth studies of collection use statistics, cataloging schemas, and content access, and also share their methodologies and analytical tools inter-institutionally.”
The project, entitled “Bandits and Browsing: Data Mining and Network Analysis for Library Collections,” will build a scalable system for library collection analysis and recommender system development. Based on the data analyses resulting from this project, the team would begin development for an enhanced recommender system for library catalogs and digital libraries that retrieves richer search results from a library collection search based on network analysis of subject relevancy, circulation data of items, and usage data for items that share interrelated subjects. In order to build this test bed for algorithm and functionalities in the recommender system, the project will utilize the advanced computing resources of XSEDE to develop self-optimizing search algorithms and network analyses that would run against the bibliographic and catalog data in library catalogs and digital library indexes.
The Library team created initial prototypes of search algorithms, topic analyses, and network analyses using the English literature collection’s 40,000-item sample set. A core algorithm was initially developed to identify items that are infrequently used, yet have a high degree of topical relevance to other heavily used works in a collection. Based on these and other analyses conducted on a sample set of data, the team will expand the scalability of the search algorithms and network analyses against a full 22 million-item subset of the University of Illinois Library catalog data using advanced computing resources.
In addition to the award of supercomputing time, the project was allocated 12 months of XSEDE technical support to maintain the necessary codes on the supercomputing infrastructure. The project team also will be supported by XSEDE database experts and other consultants.
The advanced computing resources of XSEDE will be used to conduct a full-scale analysis of the Library catalog data. Run search and indexing algorithms against the 22 million subset Library catalog records, build network graphs for subject correlations, and do full analyses for item relevancy.
Michael Simeone, Associate Director, Interdisciplinary Studies at I-CHASS and project co-PI, said “This will be an important project to show how high-end computation can help us understand the individual decisions that contribute to making broader-scale knowledge.”
Project Title: “Bandits and Browsing: Data Mining and Network Analysis for Library Collections“
Project lead: Harriett Green
Collaborators: Michael Simeone, Kirk Hess, Richard Hislop
Grant Award from: XSEDE (Extreme Science and Engineering Discovery Environment).
Award Amount: 30,000 SU of computing time
Length of grant: 12 months
Scientists, engineers, social scientists, and humanist around the world – many of them at colleges and universities – use advanced digital resources and services every day. Things like supercomputers, collections of data, and new tools are critical to the success of those researchers, who use them to make us all healthier, safer, and better informed. XSEDE integrates these resources and services, makes them easier to use, and helps more people use them. The five-year National Science Foundation-funded XSEDE project supports 16 supercomputers and high-end visualization and data analysis resources across the country through a collaborative partnership of 17 institutions. For more information on XSEDE, visit: https://xsede.org.