Topic Modeling and the Future of Ebooks

Ebook by Daniel Sancho CC BY 2.0

This semester I’ve had the pleasure of taking a course on Issues in Scholarly Communication with Dr. Maria Bonn at the University of Illinois iSchool. While we’ve touched on a number of fascinating issues in this course, I’ve been particularly interested in JSTOR Labs’ Reimagining the Monograph Project.

This project was inspired by the observation that, while scholarly journal articles have been available in digital form for some time now, scholarly books are now just beginning to become available in this format. Nevertheless, the nature of long form arguments, that is, the kinds of arguments you find in books, differs in some important ways from the sorts of materials you’ll find in journal articles. Moreover, the ways that scholars and researchers engage with books are often different from the ways in which they interact with papers. In light of this, JSTOR Labs has spearheaded an effort to better understand the different ways that scholarly books are used, with an eye towards developing digital monographs that better suit these uses.

Topicgraph logo

In pursuit of this project, the JSTOR Labs team created Topicgraph, a tool that allows researchers to see, at a glance, what topics are covered within a monograph. Users can also navigate directly to pages that cover the topics in which they are interested. While Topicgraph is presented as a beta level tool, it provides us with a clear example of the untapped potential of digital books.

A topic graph for Suburban Urbanites

Topicgraph uses a method called topic modeling, which is used in natural language processing. Topic modeling will examine text, and then create different topics that are discussed in that text based on the terms being used. Terms that are used in proximity to one another at a frequent rate are thought to serve as an indicator that various topics are being discussed.

Users can explore Topicgraph by using JSTOR Labs’ small collection of open access scholarly books that span a number of different disciplines, or by by uploading their own PDFs for Topicgraph to analyze.

If you would like to learn how to incorporate topic modeling or other forms of text analysis into your research, contact the Scholarly Commons or visit us in the Main Library, room 306.

Spotlight: JSTOR Labs Text Analyzer

JSTOR Labs has recently rolled out a beta version of a JSTOR Text Analyzer. The purpose of the Text Analyzer is different than other text analyzers (such as Voyant). The JSTOR Text Analyzer will mine documents you drop into its easy-to-use interface, and then breaks it down by topics and terms, which it will then search JSTOR with. The result? A list of JSTOR articles that relate to your research topic and help fill your bibliography.

So, how does it work?

You simply drag and drop a file– their demo file is an article named “Retelling the American West in the Museum” –, copy and paste text, or select a file from your computer and input it into the interface. What you drag and drop does not, necessarily, have to be an academic article. In fact, after inputting a relatively benign image for this blog, the Text Analyzer gave me remarkably useful results, relating to blogging and learning, the digital humanities and libraries.

Results from the Commons Knowledge blog image.

After you drop your file into JSTOR, your analysis is broken down into terms. These terms are further broken down into topics, people, locations, and organizations. JSTOR deems which terms it believes are the most important and prioritizes them, and even gives specific weight to the most important terms. However, you can customize all of these options by choosing words from the identified terms to become prioritized terms, adding or deleting prioritized terms, and changing the weight of prioritized terms. For example, here are the automatic terms and results from the demo article:

The automatic terms and results from the demo article.

However, I’m going to remove article’s author from being a prioritized term, add Native Americans and Brazilian art to the prioritized terms, and change the weight of these terms so that the latter two are the most important. This is how my terms and results list will look:

The new terms and results list.

As you can see, the results completely changed!

While the JSTOR Text Analyzer doesn’t necessarily function in ways similar to other text analyzers, its ability to find key terms will help you not only find articles on JSTOR, but use those terms in other databases. Further, it can help you think strategically about search strategies on JSTOR, and see which search terms yield (perhaps unexpectedly) the most useful results for you. So while the JSTOR Text Analyzer is still in beta, it has the potential to be an incredibly useful tool for researchers, and we’re excited to see where it goes from here!