Spotlight: JSTOR Labs Text Analyzer

JSTOR Labs has recently rolled out a beta version of a JSTOR Text Analyzer. The purpose of the Text Analyzer is different than other text analyzers (such as Voyant). The JSTOR Text Analyzer will mine documents you drop into its easy-to-use interface, and then breaks it down by topics and terms, which it will then search JSTOR with. The result? A list of JSTOR articles that relate to your research topic and help fill your bibliography.

So, how does it work?

You simply drag and drop a file– their demo file is an article named “Retelling the American West in the Museum” –, copy and paste text, or select a file from your computer and input it into the interface. What you drag and drop does not, necessarily, have to be an academic article. In fact, after inputting a relatively benign image for this blog, the Text Analyzer gave me remarkably useful results, relating to blogging and learning, the digital humanities and libraries.

Results from the Commons Knowledge blog image.

After you drop your file into JSTOR, your analysis is broken down into terms. These terms are further broken down into topics, people, locations, and organizations. JSTOR deems which terms it believes are the most important and prioritizes them, and even gives specific weight to the most important terms. However, you can customize all of these options by choosing words from the identified terms to become prioritized terms, adding or deleting prioritized terms, and changing the weight of prioritized terms. For example, here are the automatic terms and results from the demo article:

The automatic terms and results from the demo article.

However, I’m going to remove article’s author from being a prioritized term, add Native Americans and Brazilian art to the prioritized terms, and change the weight of these terms so that the latter two are the most important. This is how my terms and results list will look:

The new terms and results list.

As you can see, the results completely changed!

While the JSTOR Text Analyzer doesn’t necessarily function in ways similar to other text analyzers, its ability to find key terms will help you not only find articles on JSTOR, but use those terms in other databases. Further, it can help you think strategically about search strategies on JSTOR, and see which search terms yield (perhaps unexpectedly) the most useful results for you. So while the JSTOR Text Analyzer is still in beta, it has the potential to be an incredibly useful tool for researchers, and we’re excited to see where it goes from here!

Using Voyant Tools for Basic Text Analysis

Voyant Tools is an open source web-based application that allows users to work with their own texts or existing text collections to perform basic text mining functions. These functions make it possible to quickly extract characteristics from a corpus and discover themes. Voyant Tools is available for free at http://voyant-tools.org/. From here, users can input the text to be analyzed in multiple ways. Follow the steps below to get started.

Loading Texts

For a basic single text: paste the text into the text box.

For text from webpages: enter the URLs of the webpages into the text box, listing each URL on a   separate line.

For plain text, HTML, XML, PDF, RTF or MS Word: select the “Upload” button beneath the text box. Click “Add” for each new document and “Upload” when all documents have been added.

To use Voyant’s pre-existing text collections: select “Open” and choose from the drop down list. Currently available are the Humanist Listserv Archives and Shakespeare’s Plays.

After the text is in place, select “Reveal.”

Basic Analysis Tools

After “revealing” the text, three tools will automatically appear: Cirrus, Summary, and Corpus Reader.

Cirrus displays a word cloud of highest frequency terms. Hovering over certain words will reveal their frequency. Clicking on a word will reveal more information including a word trends graph. In order to remove articles such as “the” and “and” from the word cloud, select the cog tool above the Cirrus feature. Select the language of the text from the drop down list and then click “Ok.” This will remove stop words from the word cloud revealing a more meaningful representation.

The Summary tool will provide information about the text or group of texts including total number of documents and words, length of documents, and distinctive words in each text. This will also draw out notable peaks in frequency and vocabulary density.

The Corpus Reader will reveal the texts in the corpus allowing the user to hover over words within the text to reveal frequency and more information.

To see the additional tools of Word Trends, Keywords in Context, and Words in Documents, click on the double-arrow icon in the upper right corner. Click on the single arrow icons to open all of the windows, and use the toolbars in the bottom of their panes to generate results.

More tools are available here.

Exporting Data

Above each tool, there is a disk icon that can be selected to export data from that tool. Users will have the option to save the data as an image or a URL that will return to that data. Exporting data will prevent the need to upload the same texts each time they are required.

For more information on using Voyant Tools, see this guide and additional documentation.

To see Voyant Tools in use, explore these examples.