Text Analysis Basics – See Your Words in Voyant!

Interested in doing basic text analysis but have no or limited programming experience? Do you feel intimidated by the command line? One way to get started with text analysis, visualization, and uncovering patterns in large amounts of text is with browser-based programs! And today we have a mega blockbuster blog post extravaganza about Voyant Tools!

Voyant is a great solid browser based tool for text analysis. It is part of the Text Analysis Portal for Research (TAPoR)  http://tapor.ca/home. The current project leads are Stéfan Sinclair at McGill University (one of the minds behind BonPatron!) and Geoffrey Rockwell at the University of Alberta.

Analyzing a corpus:

I wanted to know what I needed to know to get a job so I got as many job ads as I could and ran them through very basic browser-based text analysis tools (to learn more about Word Clouds check out this recent post for Commons Knowledge all about them!) in order to see if what I needed to study in library school would emerge and I could then use that information to determine which courses I should take. This was an interesting idea and I mostly found that jobs prefer you have an ALA-accredited degree, which was consistent with what I had heard from talking to librarians. Now I have collected even more job ads (around December from the ALA job list mostly with a few from i-Link and elsewhere) to see what I can find out (and hopefully figure out some more skills I should be developing while I’m still in school).

Number of job ads = 300 there may be a few duplicates and this is not the cleanest data.

Uploading a corpus:

Voyant Tools is found at https://voyant-tools.org.

Voyant Home Page

For small amounts of text, copy and paste into the “Add Text” box. Otherwise, add files by clicking “Upload” and choosing the Word or Text files you want to analyze. Then click “Reveal”.

So I added in my corpus and here’s what comes up:

To choose a different view click  the small rectangle icon and choose from a variety of views. To save the visualization you created in order to later incorporate it into your research click the arrow and rectangle “Upload” icon and choose which aspect of the visualization you want to save.

Mode change option circled

“Stop words” are words excluded because they are very common words such as “the” or “and” that don’t always tell us anything significant about the content of our corpus. If you are interested in adding stop words beyond the default settings, you can do that with the following steps:

Summary button on Voyant circled
1. Click on Summary
Home screen for Voyant with the edit settings circled
2. Click on the define options button
Clicking on edit list in Voyant
3. If you want to add more words to the default StopList click Edit List
Edit StopList window in Voyant
4. Type in new words and edit the ones already there in the default StopList and click Save to save.
Mouse click on New User Defined List
5. Or to add your own list click New User Defined List and paste in your own list in the Edit list feature instead of editing the default list.

Here are some of the cool different views you can choose from in Voyant:

Word Cloud:

Word cloud featuring most common words in library job ads
Mess around with the original at https://voyant-tools.org/?visible=115&corpus=3de9f7190e781ce7566e01454014a969&view=Cirrus

The Links mode, which shows connections between different words and how often they are paired with the thickness of the line between them.

words of corpus connected by lines
Feel free to play around with the original visualization at https://voyant-tools.org/?corpus=3de9f7190e781ce7566e01454014a969&mode=corpus&context=30&view=CollocatesGraph

My favorite mode is TextArc based on the text analysis and visualization project of the same name created by W. Brad Paley in the early 2000s. More information about this project can be found at http://www.textarc.org/, where you can also find Text arc versions of classic literature.

Voyant is pretty basic, it will give you a bunch of stuff you probably already knew, such as to get a library job it helps to have library experience. The advantage of the TextArc setting is that it puts everything out there and lets you see the connections between different words. And okay, it looks really cool too.

Check it out the original animated below! Warning this may slow down or even crash your browser:  https://voyant-tools.org/?corpus=3de9f7190e781ce7566e01454014a969&view=TextualArc

I also like the Bubbles feature (not to be confused with the Bubblelines feature) though none of the other GAs or staff here do, one going so far as to refer to it as an “abomination”.

Circles with corpus words (also listed in side pane) on inside
Truly abominable

The reason I have not included a link to this is DEFAULT VERSION MAY NOT MEET WC3 WEB DESIGN EPILEPSY GUIDELINES. DO NOT TRY IF YOU ARE PRONE TO PHOTOSENSITIVE SEIZURES. It is adapted from the much less flashy “Letter Pairs” project created by Martin Ignacio Bereciartua. This mode can also crash your browser.

To learn more about applying for jobs we have a Savvy Researcher workshop!

If you thought these tools were cool, to learn more advanced text mining techniques we have an upcoming Savvy Researcher workshop, also on March 6 :

Happy text mining and job searching! Hope to see some of you here at Scholarly Commons on March 6!

Facebook Twitter Delicious Email