Text Analysis Basics – See Your Words in Voyant!

Interested in doing basic text analysis but have no or limited programming experience? Do you feel intimidated by the command line? One way to get started with text analysis, visualization, and uncovering patterns in large amounts of text is with browser-based programs! And today we have a mega blockbuster blog post extravaganza about Voyant Tools!

Voyant is a great solid browser based tool for text analysis. It is part of the Text Analysis Portal for Research (TAPoR)  http://tapor.ca/home. The current project leads are Stéfan Sinclair at McGill University (one of the minds behind BonPatron!) and Geoffrey Rockwell at the University of Alberta.

Analyzing a corpus:

I wanted to know what I needed to know to get a job so I got as many job ads as I could and ran them through very basic browser-based text analysis tools (to learn more about Word Clouds check out this recent post for Commons Knowledge all about them!) in order to see if what I needed to study in library school would emerge and I could then use that information to determine which courses I should take. This was an interesting idea and I mostly found that jobs prefer you have an ALA-accredited degree, which was consistent with what I had heard from talking to librarians. Now I have collected even more job ads (around December from the ALA job list mostly with a few from i-Link and elsewhere) to see what I can find out (and hopefully figure out some more skills I should be developing while I’m still in school).

Number of job ads = 300 there may be a few duplicates and this is not the cleanest data.

Uploading a corpus:

Voyant Tools is found at https://voyant-tools.org.

Voyant Home Page

For small amounts of text, copy and paste into the “Add Text” box. Otherwise, add files by clicking “Upload” and choosing the Word or Text files you want to analyze. Then click “Reveal”.

So I added in my corpus and here’s what comes up:

To choose a different view click  the small rectangle icon and choose from a variety of views. To save the visualization you created in order to later incorporate it into your research click the arrow and rectangle “Upload” icon and choose which aspect of the visualization you want to save.

Mode change option circled

“Stop words” are words excluded because they are very common words such as “the” or “and” that don’t always tell us anything significant about the content of our corpus. If you are interested in adding stop words beyond the default settings, you can do that with the following steps:

Summary button on Voyant circled

1. Click on Summary

Home screen for Voyant with the edit settings circled

2. Click on the define options button

Clicking on edit list in Voyant

3. If you want to add more words to the default StopList click Edit List

Edit StopList window in Voyant

4. Type in new words and edit the ones already there in the default StopList and click Save to save.

Mouse click on New User Defined List

5. Or to add your own list click New User Defined List and paste in your own list in the Edit list feature instead of editing the default list.

Here are some of the cool different views you can choose from in Voyant:

Word Cloud:

The Links mode, which shows connections between different words and how often they are paired with the thickness of the line between them.

My favorite mode is TextArc based on the text analysis and visualization project of the same name created by W. Brad Paley in the early 2000s. More information about this project can be found at http://www.textarc.org/, where you can also find Text arc versions of classic literature.

Voyant is pretty basic, it will give you a bunch of stuff you probably already knew, such as to get a library job it helps to have library experience. The advantage of the TextArc setting is that it puts everything out there and lets you see the connections between different words. And okay, it looks really cool too.

Check it out the original animated below! Warning this may slow down or even crash your browser:  https://voyant-tools.org/?corpus=3de9f7190e781ce7566e01454014a969&view=TextualArc

I also like the Bubbles feature (not to be confused with the Bubblelines feature) though none of the other GAs or staff here do, one going so far as to refer to it as an “abomination”.

Circles with corpus words (also listed in side pane) on inside

Truly abominable

The reason I have not included a link to this is DEFAULT VERSION MAY NOT MEET WC3 WEB DESIGN EPILEPSY GUIDELINES. DO NOT TRY IF YOU ARE PRONE TO PHOTOSENSITIVE SEIZURES. It is adapted from the much less flashy “Letter Pairs” project created by Martin Ignacio Bereciartua. This mode can also crash your browser.

To learn more about applying for jobs we have a Savvy Researcher workshop!

If you thought these tools were cool, to learn more advanced text mining techniques we have an upcoming Savvy Researcher workshop, also on March 6 :

Happy text mining and job searching! Hope to see some of you here at Scholarly Commons on March 6!

Love and Big Data

Can big data help you find true love?

It’s Love Your Data Week, but did you know people have been using Big Data for to optimize their ability to find their soul mate with the power of data science! Wired Magazine profiled mathematician and data scientist Chris McKinlay in “How to Hack OkCupid“.There’s even a book spin-off from this! “Optimal Cupid”, which unfortunately is not at any nearby libraries.

But really, we know you’re all wondering, where can I learn the data science techniques needed to find “The One”, especially if I’m not a math genius?

ETHICS NOTE: WE DO NOT ENDORSE OR RECOMMEND TRYING TO CREATE SPYWARE, ESPECIALLY NOT ON COMPUTERS IN THE SPACE. WE ALSO DON’T GUARANTEE USING BIG DATA WILL HELP YOU FIND LOVE.

What did Chris McKinlay do?

Methods used:

  • Automating tasks, such as writing a python script to answer questions on OKCupid
  • Scraping data from dating websites
  • Surveying
  • Statistical analysis
  • Machine learning to figure out how to rank the importance of answers of questions
  • Bots to visit people’s pages
  • Actually talking to people in the real world!

Things we can help you with at Scholarly Commons:

Selected workshops and resources, come by the space to find more!

Whether you reach out to us by email, phone, or in-person our experts are ready to help with all of your questions and helping you make the most of your data! You might not find “The One” with our software tools, but we can definitely help you have a better relationship with your data!

Register for Spring 2017 Workshops at CITL!

Exciting news for anyone interested in learning the basics of statistical and qualitative analysis software! Registration is open for workshops to be held throughout spring semester at the Center for Innovation in Teaching and Learning! There will be workshops on ATLAS.ti, R, SAS, Stata, SPSS, and Questionnaire Design on Tuesdays and Wednesdays in February and March from 5:30-7:30 pm. To learn more details and to register click here to go to the workshops offered by CITL page. And if you need a place to use these statistical and qualitative software packages, such as to practice the skills you gained at the workshops stop by Scholarly Commons, Monday-Friday 9 am- 6 pm! And don’t forget, you can also schedule a consultation with our experts here for specific questions about using statistical and qualitative analysis software for your research!

Event: Illinois GIS Day

  • What: A celebration of GIS Day, an “annual salute to geospatial technology and its power to transform and improve our lives.” The event is free, and includes a keynote address, presentations, lightning talk sessions, a map/poster competition, and a career connection session.
  • Where: iHotel and Conference Center, 1900 S 1st St, Champaign, IL 61820
  • When: November 15, 2016 from 8:00 AM – 4:15 PM; registration open now
  • Why: To spend a day with other GIS enthusiasts, to make important connections, and to learn new and important information about what is going on in the field, including a keynote address by Keith A. Searles, Chief Executive Officer, Urban GIS, Inc.

Bowker discusses “The Data Citizen: New Ways of Being in the World”

On Tuesday, September 20th, Geoffrey C. Bowker, professor at the Donald Bren School of Information and Computer Sciences at the University of California, Irvine, delivered the second lecture in the Design Dialogues Speakers Series at the National Center for Supercomputing Applications. Bowker’s talk, titled, “The Data Citizen: New Ways of Being in the World,” discussed the ways in which Big Data is affecting not only our lives, but is reshaping what it means to be human.

Image Credit: KamiPhuc CC BY 2.0

Image Credit: KamiPhuc CC BY 2.0

Bowker discussed many examples of ways in which Big Data impacts modern life. These included:

Despite expressing some concerns about the ways in which Big Data are used, Bowker appeared by and large optimistic about the possibilities that Big Data and design education can bring into reality. Moreover, Bowker suggested that humanists and social scientists, as well as members of the STEM fields have much to offer as we improve our understanding and use of data and design.

To learn more about design at Illinois, visit the webpage for the planned Illinois Design Center, a central component of a campus wide multidisciplinary initiative. The page includes details about the center, information about related events, and opportunities to provide your own feedback.

You can also browse the reference collection in the Scholarly Commons, which includes books on design, Big Data, and many other topics.

-post co-authored with Jasmine Kirby

Undergraduate Research Opportunity: McNair Scholars Priority Deadline 9/30!

If you are an undergraduate planning on pursuing a doctorate degree, looking for more ways to get involved in research on campus, and a member of a group underrepresented in graduate education, the TRIO McNair Scholars Program is looking for students like you!
The priority deadline is September 30 at 5 pm.
For more information about the program and the application process please check out http://omsa.illinois.edu/programs/TRIO/mcnair/

Event: “The Data Citizen: New Ways of Being in the World” Lecture by Geoffrey C. Bowker

Ariel Waldman: The Hacker’s Guide to the Galaxy

Mark your calendars: Ariel Waldman will be visiting the University of Illinois campus on March 1 to give a lecture titled, “The Hacker’s Guide to the Galaxy.” The talk will take place in the Alice Campbell Alumni Center Ballroom at 4 PM, with a reception to follow. The event is free and open to the public.

arielwaldman_headshot3

Here’s an excerpt from her official bio:

Ariel Waldman makes “massively multiplayer science”, instigating unusual collaborations that spark clever creations for science and space exploration. She is the founder of Spacehack.org, a directory of ways to participate in space exploration, and the global director of Science Hack Day, a 20-countries-and-growing grassroots endeavor to make things with science. She is the author of What’s It Like in Space?: Stories from Astronauts Who’ve Been There (Chronicle Books, 2016). Ariel is also the co-author of a congressionally-requested National Academy of Sciences study on the future of human spaceflight. She sits on the council for NASA Innovative Advanced Concepts (NIAC), a program that nurtures radical, science fiction-like ideas that could transform future space missions. In 2013, Ariel received an honor from the White House for being a Champion of Change in citizen science.

Let your friends know you’re going with the Facebook event page! In the meantime, you can learn more about Ariel on her official website.

Submissions are Open for Image of Research 2016

In conjunction with the Graduate College, the Scholarly Commons is pleased to announce the opening of the Image of Research competition for the 2015-2016 academic year!

The Image of Research is a celebration of the diversity and breadth of graduate student research at the University of Illinois at Urbana-Champaign. Graduate and professional students from all disciplines are invited to submit entries consisting of an image that represents their research (either concretely or abstractly) and a brief written narrative.

Submissions will be accepted through January 15, 2016, after which judges will select a list of semi-finalists. From the semi-finalists, the judges will award four prizes:

  • First Prize: $500
  • Second Prize: $300
  • Third Prize: $200
  • Honorable Mention: $100

Awards will be presented at a reception on April 6, 2016 in conjunction with the Annual Graduate Student Appreciation Week. Attendees of the reception will have the opportunity to vote for a semi-finalist to receive the People’s Choice Award ($100).

For more information about this year’s competition, or to submit an entry, visit the Image of Research website. Past entries and winners can be viewed in the online gallery and in IDEALS.

Event: THATCamp — Indiana 2015

Interested in the intersection of humanities and technology and looking for an event close to campus? Curious about unconferences? Check out THATCamp Indiana 2015! THATCamp is an informal meeting where humanists and technologists of many professions work together in sessions that are decided on the spot. THATCamps, and unconferences in general, are driven by the participants, their interests and experiences, and their participation in the events. If you have ideas, you can propose a session in advance and check out some past examples. Learn more about the unconference on the About THATCamp page.

The event will take place on Friday, July 24, 2015 at the Indiana University School of Medicine, Ruth Lilly Medical Library, Room 317 (TBL Lab), from 9:30am to 4:00pm. Check out (or contribute to) the rideshare options and register for the event here!