Exploring Data Visualization #7

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A collection of six different radar charts, each showing one student's test scores in multiple subjects

From “The Radar Chart and its Caveats” by Yan Holtz

1) Data analyst Yan Holtz and designer Conor Healy have helpfully compiled a list of visualization caveats at their site From Data to Viz. Among the common pitfalls in data visualization they discuss the use of radar charts, as in the image above.

Two elementary school floor plans generated by computer modeling, optimized to minimize traffic flow between classes and material usage. The floor plans look biological, with the hallways branching to smaller hallways and the rooms shaped as all sorts of polygons instead of rectangular.

From “Evolving Floorplans,” created by Joel Simon

2) Bioinformaticist Joel Simon “grew” an elementary school floor plan using advanced computer science methods. As he points out, “The results were biological in appearance, intriguing in character and wildly irrational in practice.” The project certainly demonstrates that computer models are only as good as the data that humans give them (in this case, there were no constraints based on architecture or engineering rules). On the other hand, imagine your school was laid out like this! Read all about the project at Simon’s website.

A demonstration of a chart makeover. The before chart shows two pie charts. Each slice of the pie chart is the percentage of U.S. population within an age group. The first pie chart is 2010, the second is 2013. The makeover, or "after" chart, is a slope graph that shows the change in millions of people within each age group, which are each represented by a line.

Chart makeover created by Patricia Manasan for Storytelling With Data

3) Want to feel inspired? Dozens of people submitted data visualization makeovers to Storytelling With Data. Take a look at what people changed for ideas about how to make your own visualizations better.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Lightning Review: the truthful art by Alberto Cairo

Image of the truthful art

Hailed by one of our librarians as a brilliant and seminal text to understanding data visualization, the truthful art is a text that can serve both novices and masters in the field of visualization.

Packed with detailed descriptions, explanations, and images of just how Cairo wants readers to understand and engage with knowledge and data. Nearly every page of this work, in fact, is packed with examples of the methods Cairo is trying to connect his readers to.

Cairo’s work not only teaches readers how to best design their own visualizations, but goes into the process of explaining how to *read* data visualizations themselves. Portions of chapters are devoted to the necessity of ‘truthful’ visualizations, not only because “if someone hides data from you, they probably have something to hide” (Cairo, 2016, p. 49). The exact same data, when presented in different ways, can completely change the audience’s perspective on what the ‘truth’ of the matter is.

The most I read through the truthful art, the harder time I had putting it down. Cairo’s presentations of data, how vastly they could differ depending upon the medium through which they were visualized. It was amazing how Cairo could instantly pick apart a bad visualization, replacing it with one that was simultaneously more truthful and more beautiful.

There is specific portion of Chapter 2 where Cairo gives a very interesting visualization of “How Chicago Changed the Course of Its Rivers”. It’s detailed, informative, and very much a classic data visualization.

Then he compared it to a fountain.

The fountain was beautiful, and designed in a way to tell the same story as the maps Cairo had created. It was fascinating to see data presented in such a way, and I hadn’t fully considered that data could be represented in such a unique way.

the truthful art is here on our shelves in the Scholarly Commons, and we hope you’ll stop and give it a read! It’s certainly worthwhile one!

Exploring Data Visualization #6

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

U.S. immigration represented by concentric rings like a tree, where outermost ring is the most recent, with colors denoting immigrants' origin primarily by continent

from National Geographic, “200 Years of U.S. Immigration Looks Like the Rings of a Tree”

1) Two Northeastern University professors visualized immigration data for National Geographic by creating a fascinating chart that looks a lot like the growth rings of a tree. They write, “Like countries, trees can be hundreds, even thousands, of years old. Cells grow slowly, and the pattern of growth influences the shape of the trunk. Just as these cells leave an informational mark in the tree, so too do incoming immigrants contribute to the country’s shape.”

two line graphs, one with a legend and one with direct line labeling, demonstrating the advantage of the latter

from StorytellingWithData, “Accessible data viz is better data viz”

2) Accessibility is important in all kinds of communication, and data visualization is no exception. But it’s not always obvious how to make visualizations more accessible. You can find several tips for improving your visualization in “Accessible data viz is better data viz.”

Polar histograms of the streets in major cities across the U.S.

by Geoff Boeing, “Comparing City Street Orientations”

3) Urban planning postdoc Geoff Boeing used open map data to create a series of polar histograms that demonstrate how the streets in various U.S. cities do or don’t follow a neat grid. It’s a great example of a visualization that looks intriguing and also packs a lot of information. Learn more about it in his blog post, Comparing City Street Orientations.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Exploring Data Visualization #5 – R edition

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

This month, I wanted to share some resources specifically for learning to visualize data using R.

1) R is a free, open source programming language that is heavily used for statistical analysis, but has also expanded to encompass nearly any kind of data analysis you would want to do. In the Scholarly Commons, we have R and RStudio (a user-friendly R development environment) installed on all of our lab computers. RStudio’s website provides links to a lot of ways for you to get started with R.

2) R guru Hadley Wickham gave a public lecture at the University of Notre Dame last August. (Note that his talk starts about 37 minutes into the video.) In the lecture, he walks through a simple example of the iterative process of data visualization in R, and gives additional related advice for doing data science. You can learn from his lecture without knowing any R, but you will find it easier to understand if you have basic experience with programming in general.

3) If you want a book to help you learn more in depth, Wickham and a colleague wrote R for data science: Import, tidy, transform, visualize, and model data. You can read R for data science online, or you can come in to the Scholarly Commons to read the physical book while practicing on one of our lab computers.

4) You can also find a number of specific R courses at Lynda.com, such as “Data Visualization in R with ggplot2.” Just make sure to log in with your U of I credentials so you can access the courses for free.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Using an Art Museum’s Open Data

*Edits on original idea and original piece by C. Berman by Billy Tringali

As a former art history student, I’m incredibly interested in the how the study of art history can be aided by the digital humanities. More and more museums have started allowing the public to access a portion of their data. When it comes to open data, museums seem to be lagging a bit behind other cultural heritage institutions, but many are providing great open data for humanists.

For art museums, the range of data provided ranges. Some museums are going the extra mile to give a lot of their metadata to the public. Others are picking and choosing aspects of their collection, such as the Museum of Modern Art’s Exhibition and Staff Histories.

Many museums, especially those that collect modern and contemporary art, can have their hands tied by copyright laws when it comes to the data they present. A few of the data sets currently available from art museums are the Cooper Hewitt’s Collection Data, the Minneapolis Institute of Arts metadata, the Rijksmuseum API, the Tate Collection metadata, and the Getty Vocabularies.

The Metropolitan Museum of Art has recently released all images of the museum’s public domain works under a Creative Commons Zero license.

More museum data can be found here!

Exploring Data Visualization #4

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in May:

a cartoon image of a few buildings, above two cartoon characters, one who is pointing and saying "We missed people here," while the other character shrugs and says "We can't do anything about it"

from Alvin Chang at Vox, “How Republicans are undermining the 2020 census, explained with a cartoon”

1) Alvin Chang, Senior Graphics Reporter at Vox, “covers policy by making explainers with charts and cartoons.” This month he explained the precarious state of the upcoming 2020 U.S. Census.

a dual-axis line chart overlaid with a stick figure drawing of a confused person misreading the chart's data

from Lisa Charlotte Rost at Uncharted, “Why not to use two axes, and what to use instead”

2) Lisa Charlotte Rost, a designer for Datawrapper, explains why dual-axis charts are almost always terrible, and what you can use instead.

text saying "The Wisdom and/or Madness of Crowds," surrounded by a cartoon rendering of a network graph

“The Wisdom and/or Madness of Crowds,” a game created by Nicky Case

3) Play this cute game! Nicky Case combines the logic of network graphs with the science of crowds in an “explorable” that shows why some crowds generate wisdom, while others create madness.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Exploring Data Visualization #3

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in April:

a photograph of a knit pattern in a very strange shape, using green yarn

“Make Caows and Shapcho” pattern knit by MeganAnn (https://www.ravelry.com/projects/MeganAnn/skyknit-the-collection)

1) Janelle Shane, who has created a new kind of humor based on neural networks, trained a neural network to generate knitting patterns. Experienced knitters then attempted these patterns so we can see what the computer generated, ranging from reasonable to silly to downright creepy creations.

map showing that many areas of the United States get their first leaf earlier than in the past

from NASA Earth Observatory, “Spring is Arriving Earlier in National Parks”

2) Considering we had snowfall in April, you might not think spring began early this year (I know I don’t!). But broadly speaking, climate change has caused spring to begin earlier and earlier across the United States. The NASA Earth Observatory looked at data published in 2016 to create maps that visualize how climate change has changed the timing of spring.

3) If you want to learn a new tool but aren’t sure what to choose, have a look at Nathan Yau’s suggestions in his post What I Use to Visualize Data. He even divides his list into categories based on where he is in the process, such as initial data processing versus final visualizations.

Using Reddit’s API to Gather Text Data

The Reddit logo.

I initially started my research with an eye to using digital techniques to analyze an encyclopedia that collects a number of conspiracy theories in order to determine what constitute typical features of conspiracy theories. At this point, I realize there were two flaws in my original plan. First, as discussed in a previous blog post, the book I selected failed to provide the sort of evidence I required to establish typical features of conspiracy theories. Second, the length of the book, though sizable, was nowhere near large enough to provide a corpus that I could use a topic model on in order to derive interesting information.

My hope is that I can shift to online sources of text in order to solve both of these problems. Specifically, I will be collecting posts from Reddit. The first problem was that my original book merely stated the content of a number of conspiracy theories, without making any effort to convince the reader that they were true. As a result, there was little evidence of typical rhetorical and argumentative strategies that might characterize conspiracy theories. Reddit, on the other hand, will provide thousands of instances of people interacting in an effort to convince other Redditors of the truth or falsity of particular conspiracy theories. The sorts of strategies that were absent from the encyclopedia of conspiracy theories will, I hope, be present on Reddit.
The second problem was that the encyclopedia failed to provide a sufficient amount of text. Utilizing Reddit will certainly solve this problem; in less than twenty-four hours, there were over 1,300 comments on a recent post alone. If anything, the solution to this problem represents a whole new problem: how to deal with such a vast (and rapidly changing) body of information.

Before I worry too much about that, it is important that I be able to access the information in the first place. To do this, I’ll need to use Reddit’s API. API stands for Application Programming Interface, and it’s essentially a tool for letting a user interact with a system. In this case, the API allows a user to access information on the Reddit website. Of course, we can already do this with an web browser. The API, however, allows for more fine-grained control than a browser. When I navigate to a Reddit page with my web browser, my requests are interpreted in a very pre-scripted manner. This is convenient; when I’m browsing a website, I don’t want to have to specify what sort of information I want to see every time a new page loads. However, if I’m looking for very specific information, it can be useful to use an API to hone in on just the relevant parts of the website.

For my purposes, I’m primarily interested in downloading massive numbers of Reddit posts, with just their text body, along with certain identifiers (e.g., the name of the poster, timestamp, and the relation of that post to other posts). The first obstacle to accessing the information I need is learning how to request just that particular set of information. In order to do this, I’ll need to learn how to write a request in Reddit’s API format. Reddit provides some help with this, but I’ve found these other resources a bit more helpful. The second obstacle is that I will need to write a program that automates my requests, to save myself from having to perform tens of thousands of individual requests. I will be attempting to do this in Python. While doing this, I’ll have to be sure that I abide by Reddit’s regulations for using its API. For example, a limited number of requests per minute are allowed so that the website is not overloaded. There seems to be a dearth of example code on the Internet for text acquisition of this sort, so I’ll be posting a link to any functional code I write in future posts.

Whimsical Data

Photograph of a Yorkshire terrier in a field of yellow flowers.

It’s finally springtime!

It’s April! After what felt like eternity, it’s starting to warm up here at the University of Illinois at Urbana-Champaign. So today, in celebration of spring, we’re going to take a look at few whimsical data sets that have made us laugh, smile, and think.

Dogs of NYC

Dogs of NYC was published by the NYC Department of Health and Mental Hygiene in 2013. The department collected data on 50,000 New York dogs, including their name, gender, breed, birth date, dominant, secondary and third color, and whether they are spayed/neutered or a guard dog, along with the borough they live in and their zip code. WYNC used this data to explore dog names and breeds by area, and Kaylin Pavlik used the data to show the relationship between dog names and dog breeds.

What made us laugh: How high the TF-IDF score for the name Pugsley was for Pugs as compared to other breeds.

What made us think: Does the perceived danger of a dog breed influence what people name them?

UK Government Hospitality wine cellar annual statement

Each year, the UK publishes an annual statement on the Government Wine Cellar, which they describe as being “used to support the work of Government Hospitality in delivering business hospitality for all government ministers and departments”. The first report was published in July 2014, and the latest was published in September 2017.

What made us laugh: Government Hospitality has an an advisory committee that meets four times a year and are known as Masters of Wine. They are unpaid.

What made us think: With threats to government transparency across the globe, it is nice to see data that some may brush off as inconsequential, but actually deals with large sums of money.

Most Popular Christmas Toys According to Search Data

Published by Reckless in November 2017, this data set shows search data based on the Toys R Us catalog (RIP) that shows which toys, video games, and board games were most popular among different age groups. Favorite toys included the Barbie Dreamhouse, Furby Connect, Razor Crazy Cart, and R2D2 Interactive Robotic Droid.

What made us laugh: The Silly Sausage game was one of the most searched board games during this period.

What made us think: Toys play a pivotal role during childhood development. It’s a little astonishing to see that, despite all of her critics, Barbie still reigns supreme in the 2-4 year-old age group.

Do you have a favorite data set? Let us know in the comments!

Exploring Data Visualization #2

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in March:

Chart showing that the sons of black families from the top 1 percent had about the same chance of being incarcerated on a given day as the sons of white families earning $36,000

From The New York Times, “Extensive Data Shows Punishing Reach of Racism for Black Boys”

1) The New York Times took data from a recent study about income inequality and designed a variety of compelling data visualizations. The article text and the visualizations complement each other to convey the pervasive insidiousness of racism, especially for black boys.

A chart legend with the categories

From Elijah Meeks, “Color Advice for Data Visualization with D3.js”

2) D3.js is an open JavaScript library that you can use to visualize data. A data visualization engineer at Netflix (what an interesting job!), Elijah Meeks provides some great advice when picking your colors in D3. More importantly, these tips are helpful no matter what visualization tool you use.

A demonstration of selecting bins for histograms, showing too few, too many, and just the right number

From Mikhail Popov, “Plotting the Course Through Charted Waters”

3) Want to learn some data visualization basics? Mikhail Popov from Wikimedia conducted a data visualization literacy workshop for Wikimedia Foundation’s All Hands 2018 staff conference, and he made the entire workshop available online.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.