Lightning Review: Data Visualization for Success

Data visualization is where the humanities and sciences meet: viewers are dazzled by the presentation yet informed by research. Lovingly referred to as “the poster child of interdisciplinarity” by Steven Braun, data visualization brings these two fields closer together than ever to help provide insights that may have been impossible without the other. In his book Data Visualization for Success, Braun sits down with forty designers with experience in the field to discuss their approaches to data visualization, common techniques in their work, and tips for beginners.

Braun’s collection of interviews provides an accessible introduction into data visualization. Not only is the book filled with rich images, but each interview is short and meant to offer an individual’s perspective on their own work and the field at large. Each interview begins with a general question about data visualization to contribute to the perpetual debate of what data visualization is and can be moving forward.

Picture of Braun's "Data Visualization for Success"

Antonio Farach, one of the designers interviewed in the book, calls data visualization “the future of storytelling.” And when you see his work – or really any of the work in this book – you can see why. Each new image has an immediate draw, but it is impossible to move past without exploring a rich narrative. Visualizations in this book cover topics ranging from soccer matches to classic literature, economic disparities, selfie culture, and beyond.

Each interview ends by asking the designer for their advice to beginners, which not only invites new scholars and designers to participate in the field but also dispels any doubt of the hard work put in by these designers or the science at the root of it all. However, Barbara Hahn and Christine Zimmermann of Han+Zimmermann may have put it best, “Data visualization is not making boring data look fancy and interesting. Data visualization is about communicating specific content and giving equal weight to information and aesthetics.”

A leisurely, stunning, yet informative read, Data Visualization for Success offers anyone interested in this explosive field an insider’s look from voices around the world. Drop by the Scholarly Commons during our regular hours to flip through this wonderful read.

And finally, if you have any further interest in data visualization make sure you stay up to date on our Exploring Data Visualization series or take a look at what services the Scholarly Commons provides!

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 1

Do you have a dataset that you want visualized on a map, but don’t have the time or resources to learn GIS or consult with a GIS Specialist? Don’t worry, because ArcGIS Online allows anybody to create simple web maps for free! In part one of this series you’ll learn how to prepare and import your data into a Web Map, and in part two you’ll learn how to geographically visualize that data in a few different ways. Let’s get started!

The Data

First things first, we need data to work with. Before we can start fiddling around with ArcGIS Online and web maps, we need to ensure that our data can be visualized on a map in the first place. Of course, the best candidates for geographic visualization are datasets that include location data (latitude/longitude, geographic coordinates, addresses, etc.), but in reality, most projects don’t record this information. In order to provide an example of how a dataset that doesn’t include location information can still be mapped, we’re going to work with this sample dataset that I downloaded from FigShare. It contains 1,000 rows of IP addresses, names, and emails. If you already have a dataset that contains location information, you can skip this section and go straight to “The Web Map.”

In order to turn this data into something that’s mappable, we need to read the IP addresses and output their corresponding location information. IP addresses only provide basic city-level information, but that’s not a concern for the sample map that we’ll be creating here. There are loads of free online tools that interpret latitude/longitude data from a list of IP addresses, so you can use any tool that you like – I’m using one called Bulk IP Location Lookup because it allows me to run 500 lines at a time, and I like the descriptiveness of the information it returns. I only converted 600 of the IP addresses in my dataset because the tool is pretty sluggish, and then I used the “Export to CSV” function to create a new spreadsheet. If you’re performing this exercise along with me, you’ll notice that the exported spreadsheet is missing quite a bit of information. I’m assuming that these are either fake IP addresses from our sample dataset, or the bulk lookup tool isn’t working 100% properly. Either way, we now have more than enough data to play around with in a web map.

IP Address Lookup Screencap

Bulk IP Location Lookup Tool

The Web Map

Now that our data contains location information, we’re ready to import it into a web map. In order to do this, we first need to create a free ArcGIS Online account. After you’ve done that, log in and head over to your “Content” page and click “Create → Map” to build a blank web map. You are now brought to the Map Viewer, which is where you’ll be doing most of your work. The Map Viewer is a deceptively powerful tool that lets you perform many of the common functions that you would perform on ArcGIS for Desktop. Despite its name, the Map Viewer does much more than let you view maps.

Map Viewer (No Data)

The Map Viewer

Let’s begin by importing our CSV into the Web Map: select “Add → Add Layer From File.” The pop-up lets you know that you can upload Shapefile, CSV, TXT, or GPX files, and includes some useful information about each format. Note the 1,000 item limit on CSV and TXT files – if you’re trying to upload research data that contains more than 1,000 items, you’ll want to create a Tile Layer instead. After you’ve located your CSV file, click “Import Layer” and you should see the map populate. If you get a “Warning: This file contains invalid characters…” pop-up, that’s due to the missing rows in our sample dataset – these rows are automatically excluded. Now is a good time to note that your location data can come in a variety of formats, not just latitude and longitude data. For a full list of supported formats, read Esri’s help article on CSV, TXT, and GPX files. If you have a spreadsheet that contains any of the location information formats listed in that article, you can place your data on a map!

That’s it for part one! In part two we’re going to visualize our data in a few different ways and export our map for presentation.

DIY Data Science

Data science is a special blend of statistics and programming with a focus on making complex statistical analyses more understandable and usable to users, typically through visualization. In 2012, the Harvard Business Review published the article, “Data Scientist: The Sexiest Job of the 21st Century” (Davenport, 2012), showing society’s perception of data science. While some of the excitement of 2012 has died down, data science continues on, with data scientists earning a median base salary over $100,000 (Noyes, 2016).

Here at the Scholarly Commons, we believe that having a better understanding of statistics means you are less likely to get fooled when they are deployed improperly, and will help you have a better understanding of the inner workings of data visualization and digital humanities software applications and techniques. We might not be able to make you a data scientist (though certainly please let us know if inspired by this post and you enroll in formal coursework) but we can share some resources to let you try before you buy and incorporate methods from this growing field in your own research.

As we have discussed again and again on this blog, whether you want to improve your coding, statistics, or data visualization skills, our collection has some great reads to get you started.

In particular, take a look at:

The Human Face of Big Data created by Rick Smolan and Jennifer Erwitt

  • This is a great coffee table book of data visualizations and a great flip through if you are here in the space. You will learn a little bit more about the world around you and will be inspired with creative ways to communicate your ideas in your next project.

Data Points: Visualization That Means Something by Nathan Yau

  • Nathan Yau is best known for being the man behind Flowing Data, an extensive blog of data visualizations that also offers tutorials on how to create visualizations. In this book he explains the basics of statistics and visualization.

Storytelling with Data by Cole Nussbaumer Knaflic

LibGuides to Get You Started:

And more!

There are also a lot of resources on the web to help you:

The Open Source Data Science Masters

  • This is not an accredited masters program but rather a curated collection of suggested free and low-cost print and online resources for learning the various skills needed to become a data scientist. This list was created and is maintained by Clare Corthell of Luminant Data Science Consulting
  • This list does suggest many MOOCS from universities across the country, some even available for free

Dataquest

  • This is a project-based data science course created by Vik Paruchuri, a former Foreign Service Officer turned data scientist
  • It mostly consists of a beginner Python tutorial, though it is only one of many that are out there
  • Twenty-two quests and portfolio projects are available for free, though the two premium versions offer unlimited quests, more feedback, a Slack community, and opportunities for one-on-one tutoring

David Venturi’s Data Science Masters

  • A DIY data science course, which includes a resource list, and, perhaps most importantly, includes links to reviews of data science online courses with up to date information. If you are interested in taking an online course or participating in a MOOC this is a great place to get started

Mitch Crowe Learn Data Science the Hard Way

  • Another curated list of data science learning resources, this time based on Zed Shaw’s Learn Code the Hard Way series. This list comes from Mitch Crowe, a Canadian data science

So, is data science still sexy? Let us know what you think and what resources you have used to learn data science skills in the comments!

Works Cited:

Davenport, T. H., & Patil, D. J. (2012, October 1). Data Scientist: The Sexiest Job of the 21st Century. Retrieved June 1, 2017, from https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Noyes, K. (2016, January 21). Why “data scientist” is this year’s hottest job. Retrieved June 1, 2017, from http://www.pcworld.com/article/3025502/why-data-scientist-is-this-years-hottest-job.html

Learning how to present with Michael Alley’s The Craft of Scientific Presentations

Slideshows are serious business, and bad slides can kill. Many books, including the one I will review today, discuss the role that Morton Thiokol’s poorly designed and overly complicated slides about the Challenger O-rings played in why the shuttle was allowed to launch despite its flaws. PowerPoint has become the default presentation style in a wide range of fields — regardless of whether or not that is a good idea, see the 2014 Slate article “PowerPointLess” by Rebecca Schuman.  With all that being said, in order to learn a bit more about how to present, I read The Craft of Scientific Presentations by Michael Alley, an engineering communications professor at Penn State.

To start, what did Lise Meitner, Barbara McClintock, and Rosalind Franklin have in common? According to Michael Alley, their weak science communication skills meant they were not taken as seriously even though they had great ideas and did great research… Yes, the author discusses how Niels Bohr was a very weak speaker (which only somewhat had to do with English being his third language) but it’s mostly in the context of his Nobel Prize speech or trying to talk to Winston Churchill; in other words, the kinds of opportunities that many great women in science never got… Let’s just say the decontextualized history of science factoids weaken some of the author’s arguments…

This is not to say that science communication is not important but these are some important ideas to remember:

Things presentation skills can help you with:

  • Communicating your ideas with a variety of audiences more effectively
  • Marketing your research and yourself as a researcher more effectively
  • Creating engaging presentations that people pay attention to

Things presentation skills cannot help you with:

  • Overcoming systemic inequality in academia and society at large, though speaking out about your experiences and calling out injustice when you see it can help in a very long term way
  • Not feeling nervous especially if you have an underlying anxiety disorder, though practice can potentially reduce that feeling

For any presentation:  know your topic well, be very prepared, and actually practice giving your talk more than you do anything else (such as making slides). But like any skill, the key is practice practice practice!

For the most part, this book is a great review of the common sense advice that’s easy to forget when you are standing in front of a large audience with everyone looking at you expectantly. The author also offers a lot of great critiques of the default presentations you can churn out with PowerPoint and of PowerPoint itself. PowerPoint has the advantage of being the most common type of slideshow presentation software, though alternatives exist and have been discussed in depth elsewhere on the blog and in university resources. Alley introduces the Assertion-Evidence approach in which you reach people through presenting your research as memes images with text statement overlay. Specifically, you use one sentence summaries and replace bullet points with visualizations. Also you have to keep in account Murphy’s Law, where slide color or a  standard font not being supported can throw off a presentation. Since Murphy’s Law does not disappear when you create a presentation around visuals, especially custom-made images and video, you may need more preparation time for this style of presentation.

Creating visualizations and one sentence summaries as well as practicing your speech to prepare for these things not working is a great strategy for preparing for a research talk. One interesting thing to think about is if Alley admits that less tested methods like TED (Technology-Entertainment-Design) and pecha kucha work for effective presentations, how much of the success of this method has to do with people caring and putting time into their presentation than a change in presentation style?

Overall this book was a good review of public speaking advice specifically targeted towards a science and engineering audience and hopefully will get people taking more time and thinking more about their presentations.

Presentation resources on campus:

  • For science specific, the definitely check out our new science communication certificate through the 21st Century Scientists Working Group and the Center for Innovation in Teaching and Learning. They offer a variety of workshops and opportunities for students develop their skills as science communicators. There’s also science communication workshops throughout the country over the summer.
  • If you have time join a speech or debate team (Mock Trial or parliamentary style debate in particular)  it’s the best way to learn how to speak extemporaneously, answer hostile questions on the fly, and get coaching and feedback on what you need to work on. If you’re feeling really bold, performing improv comedy can help with these skills as well.
  • If you don’t have time to be part of a debate team or you can’t say “yes and…” to joining an improv comedy troupe take advantage of opportunities to present when you can at various events around campus. For example, this year’s Pecha Kucha Night is going to be June 10th at Krannert Center and applications are due by April 30!  If this is still too much find someone, whether in your unit, the Career Center, etc. who will listen to you talk about your research. Or if you have motivation and don’t mind cringe get one of your friends to record you presenting (if you don’t want to use your phone for this check out the loanable tech at the UGL!)

And for further reading take a look at:

http://guides.library.illinois.edu/presentation/getting_started

Hope this helps, and good luck with your research presentations!

Text Analysis Basics – See Your Words in Voyant!

Interested in doing basic text analysis but have no or limited programming experience? Do you feel intimidated by the command line? One way to get started with text analysis, visualization, and uncovering patterns in large amounts of text is with browser-based programs! And today we have a mega blockbuster blog post extravaganza about Voyant Tools!

Voyant is a great solid browser based tool for text analysis. It is part of the Text Analysis Portal for Research (TAPoR)  http://tapor.ca/home. The current project leads are Stéfan Sinclair at McGill University (one of the minds behind BonPatron!) and Geoffrey Rockwell at the University of Alberta.

Analyzing a corpus:

I wanted to know what I needed to know to get a job so I got as many job ads as I could and ran them through very basic browser-based text analysis tools (to learn more about Word Clouds check out this recent post for Commons Knowledge all about them!) in order to see if what I needed to study in library school would emerge and I could then use that information to determine which courses I should take. This was an interesting idea and I mostly found that jobs prefer you have an ALA-accredited degree, which was consistent with what I had heard from talking to librarians. Now I have collected even more job ads (around December from the ALA job list mostly with a few from i-Link and elsewhere) to see what I can find out (and hopefully figure out some more skills I should be developing while I’m still in school).

Number of job ads = 300 there may be a few duplicates and this is not the cleanest data.

Uploading a corpus:

Voyant Tools is found at https://voyant-tools.org.

Voyant Home Page

For small amounts of text, copy and paste into the “Add Text” box. Otherwise, add files by clicking “Upload” and choosing the Word or Text files you want to analyze. Then click “Reveal”.

So I added in my corpus and here’s what comes up:

To choose a different view click  the small rectangle icon and choose from a variety of views. To save the visualization you created in order to later incorporate it into your research click the arrow and rectangle “Upload” icon and choose which aspect of the visualization you want to save.

Mode change option circled

“Stop words” are words excluded because they are very common words such as “the” or “and” that don’t always tell us anything significant about the content of our corpus. If you are interested in adding stop words beyond the default settings, you can do that with the following steps:

Summary button on Voyant circled

1. Click on Summary

Home screen for Voyant with the edit settings circled

2. Click on the define options button

Clicking on edit list in Voyant

3. If you want to add more words to the default StopList click Edit List

Edit StopList window in Voyant

4. Type in new words and edit the ones already there in the default StopList and click Save to save.

Mouse click on New User Defined List

5. Or to add your own list click New User Defined List and paste in your own list in the Edit list feature instead of editing the default list.

Here are some of the cool different views you can choose from in Voyant:

Word Cloud:

The Links mode, which shows connections between different words and how often they are paired with the thickness of the line between them.

My favorite mode is TextArc based on the text analysis and visualization project of the same name created by W. Brad Paley in the early 2000s. More information about this project can be found at http://www.textarc.org/, where you can also find Text arc versions of classic literature.

Voyant is pretty basic, it will give you a bunch of stuff you probably already knew, such as to get a library job it helps to have library experience. The advantage of the TextArc setting is that it puts everything out there and lets you see the connections between different words. And okay, it looks really cool too.

Check it out the original animated below! Warning this may slow down or even crash your browser:  https://voyant-tools.org/?corpus=3de9f7190e781ce7566e01454014a969&view=TextualArc

I also like the Bubbles feature (not to be confused with the Bubblelines feature) though none of the other GAs or staff here do, one going so far as to refer to it as an “abomination”.

Circles with corpus words (also listed in side pane) on inside

Truly abominable

The reason I have not included a link to this is DEFAULT VERSION MAY NOT MEET WC3 WEB DESIGN EPILEPSY GUIDELINES. DO NOT TRY IF YOU ARE PRONE TO PHOTOSENSITIVE SEIZURES. It is adapted from the much less flashy “Letter Pairs” project created by Martin Ignacio Bereciartua. This mode can also crash your browser.

To learn more about applying for jobs we have a Savvy Researcher workshop!

If you thought these tools were cool, to learn more advanced text mining techniques we have an upcoming Savvy Researcher workshop, also on March 6 :

Happy text mining and job searching! Hope to see some of you here at Scholarly Commons on March 6!

Review: The Infographic History of the World by Valentina D’Efilippo and James Ball

The Infographic History of the World, created by Valentina D’Efilippo and James Ball, consists of various infographics with accompanying commentaries. You can find this book and read it at Scholarly Commons, near our other infographic and visualization books! You can also check it out from a nearby library!

Overall, this book is a compelling read and an interesting idea as a project and some of the infographics were really well done. This book demonstrates the power of infographics to help us present and break down important topics to wider audiences. Yes, this isn’t supposed to be a serious read, but there was a lot I did not like about this book, specifically throughout I got a sense that:

Statue of a person with hands over face. Located by the Main Library entrance facing the UGL

Somewhere a political scientist is crying…     Photo credit to E. Hardesty and the Main Library with the original image found at https://flic.kr/p/rw2Ldz

  • “The story of the last 4,000 years is one of nations being founded, breaking apart, going to war, and coming together” (D’Efilippo & Ball, 2013). For those confused why this is a problem, “nation” is a very modern term and concept so that’s a serious anachronism.
  • Why is the theocracy symbol notably non-Western and not used for the English Civil War, which was apparently about republicanism?
  • A history of the “Net” that doesn’t mention Minitel.
  • First flight goes to the Wright Brothers. No mention of Santos-Dumont or the controversy (for everyone who noticed that inexplicable early aircraft cameo at this year’s Olympic opening).
  • The book is very Anglo-centric.

Sloppy stats!

 

  • I’m suspicious anytime Luxembourg wins something. Are they really the biggest drinkers or how does their small population make this data less meaningful?
  • “Absolute number of cannabis users by region” Absolute? Really?
  • Overall, not enough information on where and how a lot of the statistics were generated and why we should trust those sources. Yes, there is an appendix on the back that explains this to some extent in tiny text but not helpful for people who just glance at the infographic and assume it’s giving us useful information about the world.

Visualization issues!

 

  • Emphasizing form over function — much like the new Macbooks with so few ports they are practically landlocked — many of the infographics fail to present the information in a way that is appropriate for what they are trying to present. For example, the Mona Lisa paint by numbers probably would have been more effective as a timeline.
    • Maybe I’m just too attached to the idea of timelines being well on a line or perhaps maybe the spiral depicted on the book’s cover art.
    • Some of the infographics have way too many things going on and are trying to make too many points at once.
  • The colors on the mental illness brain are too close (and I can’t imagine how that would look to someone who is colorblind), and there are other examples where the colors are very close and render the infographic pretty, but hard to actually use to learn something from.

Finally, the authors’ claim of “not trying to be political” / “this is just for fun” is no excuse for not being thorough especially with information targeted to the public. Full disclosure or not, artists and journalists still need to be careful because what people see can influence the way they think about things. Infographics are not a neutral presentation of information, certain choices were made, and audiences need to think about who made these choices and why. Not as bad as some of the examples on this Visual Literacy and Infographics blog post, but still problematic. Please, do not be reckless when making infographics!

To learn more how to create infographics of your own check out our Savvy Researcher workshop: Introduction to Infographics Using Piktochart!

If you are an undergraduate interested in conducting research and becoming information and visual literate there is an entire set of classes in the history department for this through SourceLab. Take a look at their schedule or talk to Professor Randolph to learn more!

 

Introduction to Web-Based Word Cloud Generators

A word cloud created with Tagul using the words from this blog post!

A word cloud created with Tagul using the words from this blog post!

If you’re in a pinch and need some kind of visualization to go along with a presentation or project, a word cloud can be an easy fix. Word clouds take the most frequently used words in a block of text and create a visual where the most frequently-occurring words appear larger, and smaller words are smaller. There are thousands of ways to create a word cloud, but these are a few simple generators that can help you out when you need a word cloud in a hurry.

TagCrowd

TagCrowd is, perhaps, the simplest of all these generators to use, and one of the few generators that can create a word cloud from a URL. Simply paste the text or URL, or upload a file to TagCrowd and it will create a blue word cloud for you. There aren’t many options as far as styling goes — unlike some of the other generators we’ll be looking at — but it could not be simpler. The options that TagCrowd does give you are: language, maximum number of words, minimum frequency of words, show frequencies, group similar words, convert to lowercase, and exclusion of certain words.

That being said, be careful when you use a URL with TagCrowd. Below are two examples: the first, I copy-pasted the text of David Sedaris’ essay “Stepping Out” from The New Yorker. The second, I used the URL for the story, rather than the text. The two clouds were entirely different, and the URL didn’t give me the actual words from the story.

The TagCrowd cloud from the copy-pasted text.

The TagCrowd cloud from the copy-pasted text.

The TagCrowd cloud from the URL.

The TagCrowd cloud from the URL.

WordClouds.com

WordClouds.com provides more options than TagCrowd, and produces more aesthetically pleasing — though, perhaps, less simple to read and understand — word clouds. You can input text through copy-pasting, through a text or PDF file, as well as through a URL. Notably, the URL option works better at WordClouds.com. WordClouds.com also lets you customize your image, by fitting the word cloud into particular shapes, as well as offering different color schemes and fonts. It is also easier to get data about the frequency of word usage on WordClouds.com, and it allows you to save/share your word cloud in a variety of formats. Overall, WordClouds.com is a whimsical alternative for generating a word cloud. Below are two word clouds I created using the Sedaris essay from its URL. I chose a checkmark shape for the first cloud, and the second is an automatically-generated rainbow.

wc3

I chose to shape my word cloud as a check mark with WordClouds.com.

wc4

The rainbow option is fun and easy to use, though maybe not the most easily readable option on WordClouds.com.

Tagul

And finally, we have Tagul. Tagul is the most complicated of these three options, but also allows you to the most customization and options for your word cloud. Tagul allows you to add/subtract words easily from your word cloud, as well as give you a number of shapes, fonts, color and animation options for your word cloud. You can make something as simple as a circle in one color, or an emoji smiley face that has the word pop up when you hover over it. You will probably spend more time creating your word cloud on Tagul, but you can really make sure you’re getting what you want. Below are two word clouds — one simple, one more complicated — created with copy-pasted text from Sedaris’ essay.

wc5

Our more dramatic word cloud made with Tagul.

wc6

A simpler and easy to read word cloud created with Tagul.

There are many other options for creating word clouds, but these are three easy websites that you can use when you need a word cloud and you need one quick. How do you like to generate word clouds? What sort of projects have you used word clouds for? Let us know in the comments!