Text Analysis Basics – See Your Words in Voyant!

Interested in doing basic text analysis but have no or limited programming experience? Do you feel intimidated by the command line? One way to get started with text analysis, visualization, and uncovering patterns in large amounts of text is with browser-based programs! And today we have a mega blockbuster blog post extravaganza about Voyant Tools!

Voyant is a great solid browser based tool for text analysis. It is part of the Text Analysis Portal for Research (TAPoR)  http://tapor.ca/home. The current project leads are Stéfan Sinclair at McGill University (one of the minds behind BonPatron!) and Geoffrey Rockwell at the University of Alberta.

Analyzing a corpus:

I wanted to know what I needed to know to get a job so I got as many job ads as I could and ran them through very basic browser-based text analysis tools (to learn more about Word Clouds check out this recent post for Commons Knowledge all about them!) in order to see if what I needed to study in library school would emerge and I could then use that information to determine which courses I should take. This was an interesting idea and I mostly found that jobs prefer you have an ALA-accredited degree, which was consistent with what I had heard from talking to librarians. Now I have collected even more job ads (around December from the ALA job list mostly with a few from i-Link and elsewhere) to see what I can find out (and hopefully figure out some more skills I should be developing while I’m still in school).

Number of job ads = 300 there may be a few duplicates and this is not the cleanest data.

Uploading a corpus:

Voyant Tools is found at https://voyant-tools.org.

Voyant Home Page

For small amounts of text, copy and paste into the “Add Text” box. Otherwise, add files by clicking “Upload” and choosing the Word or Text files you want to analyze. Then click “Reveal”.

So I added in my corpus and here’s what comes up:

To choose a different view click  the small rectangle icon and choose from a variety of views. To save the visualization you created in order to later incorporate it into your research click the arrow and rectangle “Upload” icon and choose which aspect of the visualization you want to save.

Mode change option circled

“Stop words” are words excluded because they are very common words such as “the” or “and” that don’t always tell us anything significant about the content of our corpus. If you are interested in adding stop words beyond the default settings, you can do that with the following steps:

Summary button on Voyant circled

1. Click on Summary

Home screen for Voyant with the edit settings circled

2. Click on the define options button

Clicking on edit list in Voyant

3. If you want to add more words to the default StopList click Edit List

Edit StopList window in Voyant

4. Type in new words and edit the ones already there in the default StopList and click Save to save.

Mouse click on New User Defined List

5. Or to add your own list click New User Defined List and paste in your own list in the Edit list feature instead of editing the default list.

Here are some of the cool different views you can choose from in Voyant:

Word Cloud:

The Links mode, which shows connections between different words and how often they are paired with the thickness of the line between them.

My favorite mode is TextArc based on the text analysis and visualization project of the same name created by W. Brad Paley in the early 2000s. More information about this project can be found at http://www.textarc.org/, where you can also find Text arc versions of classic literature.

Voyant is pretty basic, it will give you a bunch of stuff you probably already knew, such as to get a library job it helps to have library experience. The advantage of the TextArc setting is that it puts everything out there and lets you see the connections between different words. And okay, it looks really cool too.

Check it out the original animated below! Warning this may slow down or even crash your browser:  https://voyant-tools.org/?corpus=3de9f7190e781ce7566e01454014a969&view=TextualArc

I also like the Bubbles feature (not to be confused with the Bubblelines feature) though none of the other GAs or staff here do, one going so far as to refer to it as an “abomination”.

Circles with corpus words (also listed in side pane) on inside

Truly abominable

The reason I have not included a link to this is DEFAULT VERSION MAY NOT MEET WC3 WEB DESIGN EPILEPSY GUIDELINES. DO NOT TRY IF YOU ARE PRONE TO PHOTOSENSITIVE SEIZURES. It is adapted from the much less flashy “Letter Pairs” project created by Martin Ignacio Bereciartua. This mode can also crash your browser.

To learn more about applying for jobs we have a Savvy Researcher workshop!

If you thought these tools were cool, to learn more advanced text mining techniques we have an upcoming Savvy Researcher workshop, also on March 6 :

Happy text mining and job searching! Hope to see some of you here at Scholarly Commons on March 6!

Introduction to Web-Based Word Cloud Generators

A word cloud created with Tagul using the words from this blog post!

A word cloud created with Tagul using the words from this blog post!

If you’re in a pinch and need some kind of visualization to go along with a presentation or project, a word cloud can be an easy fix. Word clouds take the most frequently used words in a block of text and create a visual where the most frequently-occurring words appear larger, and smaller words are smaller. There are thousands of ways to create a word cloud, but these are a few simple generators that can help you out when you need a word cloud in a hurry.

TagCrowd

TagCrowd is, perhaps, the simplest of all these generators to use, and one of the few generators that can create a word cloud from a URL. Simply paste the text or URL, or upload a file to TagCrowd and it will create a blue word cloud for you. There aren’t many options as far as styling goes — unlike some of the other generators we’ll be looking at — but it could not be simpler. The options that TagCrowd does give you are: language, maximum number of words, minimum frequency of words, show frequencies, group similar words, convert to lowercase, and exclusion of certain words.

That being said, be careful when you use a URL with TagCrowd. Below are two examples: the first, I copy-pasted the text of David Sedaris’ essay “Stepping Out” from The New Yorker. The second, I used the URL for the story, rather than the text. The two clouds were entirely different, and the URL didn’t give me the actual words from the story.

The TagCrowd cloud from the copy-pasted text.

The TagCrowd cloud from the copy-pasted text.

The TagCrowd cloud from the URL.

The TagCrowd cloud from the URL.

WordClouds.com

WordClouds.com provides more options than TagCrowd, and produces more aesthetically pleasing — though, perhaps, less simple to read and understand — word clouds. You can input text through copy-pasting, through a text or PDF file, as well as through a URL. Notably, the URL option works better at WordClouds.com. WordClouds.com also lets you customize your image, by fitting the word cloud into particular shapes, as well as offering different color schemes and fonts. It is also easier to get data about the frequency of word usage on WordClouds.com, and it allows you to save/share your word cloud in a variety of formats. Overall, WordClouds.com is a whimsical alternative for generating a word cloud. Below are two word clouds I created using the Sedaris essay from its URL. I chose a checkmark shape for the first cloud, and the second is an automatically-generated rainbow.

wc3

I chose to shape my word cloud as a check mark with WordClouds.com.

wc4

The rainbow option is fun and easy to use, though maybe not the most easily readable option on WordClouds.com.

Tagul

And finally, we have Tagul. Tagul is the most complicated of these three options, but also allows you to the most customization and options for your word cloud. Tagul allows you to add/subtract words easily from your word cloud, as well as give you a number of shapes, fonts, color and animation options for your word cloud. You can make something as simple as a circle in one color, or an emoji smiley face that has the word pop up when you hover over it. You will probably spend more time creating your word cloud on Tagul, but you can really make sure you’re getting what you want. Below are two word clouds — one simple, one more complicated — created with copy-pasted text from Sedaris’ essay.

wc5

Our more dramatic word cloud made with Tagul.

wc6

A simpler and easy to read word cloud created with Tagul.

There are many other options for creating word clouds, but these are three easy websites that you can use when you need a word cloud and you need one quick. How do you like to generate word clouds? What sort of projects have you used word clouds for? Let us know in the comments!