Exploring Data Visualization #3

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in April:

a photograph of a knit pattern in a very strange shape, using green yarn

“Make Caows and Shapcho” pattern knit by MeganAnn (https://www.ravelry.com/projects/MeganAnn/skyknit-the-collection)

1) Janelle Shane, who has created a new kind of humor based on neural networks, trained a neural network to generate knitting patterns. Experienced knitters then attempted these patterns so we can see what the computer generated, ranging from reasonable to silly to downright creepy creations.

map showing that many areas of the United States get their first leaf earlier than in the past

from NASA Earth Observatory, “Spring is Arriving Earlier in National Parks”

2) Considering we had snowfall in April, you might not think spring began early this year (I know I don’t!). But broadly speaking, climate change has caused spring to begin earlier and earlier across the United States. The NASA Earth Observatory looked at data published in 2016 to create maps that visualize how climate change has changed the timing of spring.

3) If you want to learn a new tool but aren’t sure what to choose, have a look at Nathan Yau’s suggestions in his post What I Use to Visualize Data. He even divides his list into categories based on where he is in the process, such as initial data processing versus final visualizations.

Facebook Twitter Delicious Email

Using Reddit’s API to Gather Text Data

The Reddit logo.

I initially started my research with an eye to using digital techniques to analyze an encyclopedia that collects a number of conspiracy theories in order to determine what constitute typical features of conspiracy theories. At this point, I realize there were two flaws in my original plan. First, as discussed in a previous blog post, the book I selected failed to provide the sort of evidence I required to establish typical features of conspiracy theories. Second, the length of the book, though sizable, was nowhere near large enough to provide a corpus that I could use a topic model on in order to derive interesting information.

My hope is that I can shift to online sources of text in order to solve both of these problems. Specifically, I will be collecting posts from Reddit. The first problem was that my original book merely stated the content of a number of conspiracy theories, without making any effort to convince the reader that they were true. As a result, there was little evidence of typical rhetorical and argumentative strategies that might characterize conspiracy theories. Reddit, on the other hand, will provide thousands of instances of people interacting in an effort to convince other Redditors of the truth or falsity of particular conspiracy theories. The sorts of strategies that were absent from the encyclopedia of conspiracy theories will, I hope, be present on Reddit.
The second problem was that the encyclopedia failed to provide a sufficient amount of text. Utilizing Reddit will certainly solve this problem; in less than twenty-four hours, there were over 1,300 comments on a recent post alone. If anything, the solution to this problem represents a whole new problem: how to deal with such a vast (and rapidly changing) body of information.

Before I worry too much about that, it is important that I be able to access the information in the first place. To do this, I’ll need to use Reddit’s API. API stands for Application Programming Interface, and it’s essentially a tool for letting a user interact with a system. In this case, the API allows a user to access information on the Reddit website. Of course, we can already do this with an web browser. The API, however, allows for more fine-grained control than a browser. When I navigate to a Reddit page with my web browser, my requests are interpreted in a very pre-scripted manner. This is convenient; when I’m browsing a website, I don’t want to have to specify what sort of information I want to see every time a new page loads. However, if I’m looking for very specific information, it can be useful to use an API to hone in on just the relevant parts of the website.

For my purposes, I’m primarily interested in downloading massive numbers of Reddit posts, with just their text body, along with certain identifiers (e.g., the name of the poster, timestamp, and the relation of that post to other posts). The first obstacle to accessing the information I need is learning how to request just that particular set of information. In order to do this, I’ll need to learn how to write a request in Reddit’s API format. Reddit provides some help with this, but I’ve found these other resources a bit more helpful. The second obstacle is that I will need to write a program that automates my requests, to save myself from having to perform tens of thousands of individual requests. I will be attempting to do this in Python. While doing this, I’ll have to be sure that I abide by Reddit’s regulations for using its API. For example, a limited number of requests per minute are allowed so that the website is not overloaded. There seems to be a dearth of example code on the Internet for text acquisition of this sort, so I’ll be posting a link to any functional code I write in future posts.

Facebook Twitter Delicious Email

Whimsical Data

Photograph of a Yorkshire terrier in a field of yellow flowers.

It’s finally springtime!

It’s April! After what felt like eternity, it’s starting to warm up here at the University of Illinois at Urbana-Champaign. So today, in celebration of spring, we’re going to take a look at few whimsical data sets that have made us laugh, smile, and think.

Dogs of NYC

Dogs of NYC was published by the NYC Department of Health and Mental Hygiene in 2013. The department collected data on 50,000 New York dogs, including their name, gender, breed, birth date, dominant, secondary and third color, and whether they are spayed/neutered or a guard dog, along with the borough they live in and their zip code. WYNC used this data to explore dog names and breeds by area, and Kaylin Pavlik used the data to show the relationship between dog names and dog breeds.

What made us laugh: How high the TF-IDF score for the name Pugsley was for Pugs as compared to other breeds.

What made us think: Does the perceived danger of a dog breed influence what people name them?

UK Government Hospitality wine cellar annual statement

Each year, the UK publishes an annual statement on the Government Wine Cellar, which they describe as being “used to support the work of Government Hospitality in delivering business hospitality for all government ministers and departments”. The first report was published in July 2014, and the latest was published in September 2017.

What made us laugh: Government Hospitality has an an advisory committee that meets four times a year and are known as Masters of Wine. They are unpaid.

What made us think: With threats to government transparency across the globe, it is nice to see data that some may brush off as inconsequential, but actually deals with large sums of money.

Most Popular Christmas Toys According to Search Data

Published by Reckless in November 2017, this data set shows search data based on the Toys R Us catalog (RIP) that shows which toys, video games, and board games were most popular among different age groups. Favorite toys included the Barbie Dreamhouse, Furby Connect, Razor Crazy Cart, and R2D2 Interactive Robotic Droid.

What made us laugh: The Silly Sausage game was one of the most searched board games during this period.

What made us think: Toys play a pivotal role during childhood development. It’s a little astonishing to see that, despite all of her critics, Barbie still reigns supreme in the 2-4 year-old age group.

Do you have a favorite data set? Let us know in the comments!

Facebook Twitter Delicious Email

Exploring Data Visualization #2

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in March:

Chart showing that the sons of black families from the top 1 percent had about the same chance of being incarcerated on a given day as the sons of white families earning $36,000

From The New York Times, “Extensive Data Shows Punishing Reach of Racism for Black Boys”

1) The New York Times took data from a recent study about income inequality and designed a variety of compelling data visualizations. The article text and the visualizations complement each other to convey the pervasive insidiousness of racism, especially for black boys.

A chart legend with the categories

From Elijah Meeks, “Color Advice for Data Visualization with D3.js”

2) D3.js is an open JavaScript library that you can use to visualize data. A data visualization engineer at Netflix (what an interesting job!), Elijah Meeks provides some great advice when picking your colors in D3. More importantly, these tips are helpful no matter what visualization tool you use.

A demonstration of selecting bins for histograms, showing too few, too many, and just the right number

From Mikhail Popov, “Plotting the Course Through Charted Waters”

3) Want to learn some data visualization basics? Mikhail Popov from Wikimedia conducted a data visualization literacy workshop for Wikimedia Foundation’s All Hands 2018 staff conference, and he made the entire workshop available online.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Facebook Twitter Delicious Email

Meet Aaron King, Scholarly Commons GIS Consultant

picture of Aaron King, GIS Consultant

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Aaron King, GIS Consultant at the Scholarly Commons. Welcome, Aaron!

What is your background and work experience?

I am from Wisconsin originally, and studied Ecology and Evolutionary Biology at University of Wisconsin-Whitewater. I focused on wolf and carnivore species populations in northern Wisconsin and in Yellowstone. Then my senior year, I stayed on to study Geography, which led to my career in GIS. I worked as a GIS analyst for one year while finishing up my geography degree. Afterwards, I worked at National Geographic in Washington D.C. Then, I worked as a GIS Analyst and Consultant for Intalytics in Ann Arbor, Michigan, while going to school for a Master’s in GIS and Bachelor of Science in Physics at Eastern Michigan University. I did a stint for Department of Defense in Madison, Wisconsin. Afterwards, I took time off to become a kayak guide, and decided to finish my schooling here at the University of Illinois.

Currently I work with Remote Sensing of the environment and geostatistics.

What led you to your field?

My background in environmental and climate science, as well as my love for geography led me into this field. I believe satellite data can be used a tool to expand this research and hopefully contribute to science and helping the world as a whole.

What is your research agenda?

I plan doing research on phenology, using a variety of data science methods. Additionally, I want to explore wildfire risk, and possibly look into health characteristics of greenspaces. Currently I am pursuing my Master’s, and I hope to continue my PhD here as well.

Do you have any favorite work-related duties?

When you get into research or your field, your knowledge blinders become very focused on what you are doing. Being in a position like this allows me to think past what I know, and explore areas of GIS that I normally do not think about, reflecting the endless possibilities of GIS. Plus, I just find it fascinating what other people are working on, and I love being part of it.

What are some of your favorite underutilized resources that you would recommend?

Programs for GIS outside of ESRI. There are a wealth of programs, free and open-source, that work just as well but are different than the standard ESRI programs. ESRI is a great option, but the amount of data and programs out there to help you with your problem is staggering. The other resource I would recommend in taking some coding lessons like through DataCamp, codeacademy, SoloLearn, or Lynda, because having that underlying knowledge of how programs work helps you understand.

If you could recommend only one book to researchers starting out in the GIS field, what would it be?

There are many great books about GIS. But the book you need to read to get into geography, which is the foundation of GIS, is How to Lie with Maps by Mark Monmonier.

Honorable mention: The Nature of Maps by Arthur Robinson and Barbara Bartz Petchenik.

Note: both books are available through the University Library, here and here.

What fields can use GIS research methods?

I had a professor, in my first class, ask us this same question. His answer was, “There is not a science or business that can’t utilize GIS in some way. Your job is to find it.”

Are there any big names in your field that people should know about?

Dr. Mei-Po Kwan (she works here, tell her I say hi), Dr. Waldo Tobler, Dr. Mathew Zook, William Morris Davis, Immanuel Kant, Arthur Robinson, Michael Jordan (seriously he studied geography, look it up!).

To schedule a consultation with Aaron, contact sc@library.illinois.edu.

Facebook Twitter Delicious Email

Data Purchase Program is Accepting Applications!

The Library’s Ninth Data Purchase Program Round is Accepting Applications!

Through the Library’s Data Purchase Program, the University Library accepts applications from campus researchers to purchase data. All applications must meet the following minimum criteria, in addition to others listed in the full program announcement.

  • The dataset must cost less than $5,000;
  • The dataset must be used for research; and
  • The Library must be able to make the data available for use by everyone at UIUC.

For some examples of past data requests supported by the Data Purchase Program, please explore the list on this page: https://www.library.illinois.edu/sc/dppdatasets

The deadline for first consideration is May 28, 2018, but applications that come in later will be considered based on availability of funds and whether the purchase can be completed by June 30, 2019.

If you have questions about the program or need help identifying data for your research, please contact the Scholarly Commons at sc@library.illinois.edu. We look forward to connecting you with the data you need!

Facebook Twitter Delicious Email

Edward Ayers: Twenty-Five Years in Digital History and Counting

Photograph of Edward Ayers.We are so excited to be hosting a talk by Edward Ayers next week! We hope you’ll join us on March 29, 2018 from 4-6 PM in 220 Main Library.

Edward Ayers has been named National Professor of the Year, received the National Humanities Medal from President Obama at the White House, won the Bancroft Prize and Beveridge Prize in American history, and was a finalist for the National Book Award and the Pulitzer Prize. He has collaborated on major digital history projects including the Valley of the Shadow, American Panorama, and Bunk, and is one of the co-hosts for BackStory, a popular podcast about American history. He is Tucker-Boatwright Professor of the Humanities and president emeritus at the University of Richmond as well as former Dean of Arts and Sciences at the University of Virginia. His most recent book is The Thin Light of Freedom: The Civil War and Emancipation in the Heart of America, published in 2017 by W. W. Norton.

His talk will be on “Twenty-Five Years in Digital History and Counting”.

Edward Ayers began a digital project just before the World Wide emerged and has been pursuing one project or several projects ever since. His current work focuses on the two poles of possibility in the medium: advanced projects in visualizing processes of history at the Digital Scholarship Lab at the University of Richmond and a public-facing project in Bunk, curating representations of the American past for a popular audience.

See you there!

Facebook Twitter Delicious Email

Project Forum: Meeting 1

A logo for the Scholarly Commons Project Forum.

On Monday, March 5, the Scholarly Commons Interns (Matt and Clay) hosted the first Project Forum Discussion. In order to address the variety of projects and scholarly backgrounds, we decided that our conversations should be organized around presentations of projects and related readings from other Digital Humanities scholars or related research.

We began by discussing some consistent topics or questions that are present in each of our Digital Humanities projects and how we conceptualize them. These questions will not only guide our reading discussion on this article, but also further conversations as we read work under the DH umbrella.

1. How does the article make its DH work legible to other scholars / fields?
2. How does the article display information?
3. What affordances or impact does the digital platform (artifact) have on the study?
4. How does the article conceptualize gaps in the data?

If you would like to participate in our next discussion, please join us Monday, March 26, at 2 pm in Library 220.

Facebook Twitter Delicious Email

Random Facts: Copyright Edition

Source: Openclipart

This post was guest authored by Scholarly Communication and Publishing Graduate Assistant Paige Kuester.

Just in case “Copyright” is one of the categories when you finally make it on Jeopardy!

  1. Facts aren’t copyrightable

Generally, unless there is some creativity in the expression associated with them, facts aren’t copyrightable. Even if you were the first person ever to know that particular fact, unless you express it in a creative fixed way, there’s no way that copyright can attach to facts.

  1. Monkeys have yet to successfully go to court and claim copyright

While this fact seems like a statement of the obvious, if you are not familiar with the Monkey Selfie case, you’ll be surprised to learn that accomplishing this was the goal of PETA recently. It’s probably a good thing that the case settled (though unsuccessfully in the eyes of monkeys that are garnering for copyright everywhere) with the owner of the camera agreeing to donate a percentage of proceeds gained from the picture to habitat protection, because how else would we have gotten access to some of these images? However, it is questionable if images taken by animals are even copyrightable at all.

  1. Just because you can’t find the © symbol, does not mean that a work does not have copyright.

Since 1989, works no longer require a copyright symbol to have copyright attached to them. Which makes having a copyright easier than in previous eras, but makes it less obvious that a work in copyrighted in general. Of course, there are benefits to including one.

  1. Plagiarism doesn’t just plague the lazy.

Apologies in advance.

  1. You own a copyright.

At least, if you have ever written anything creative down in a fixed medium that was your own idea, you own one. Probably more than one, including marker scribbles and grocery lists and papers that you wrote in high school. As long as you don’t transfer your rights, you will hold that copyright for your entire life plus seventy years.

Make sure you share your winnings with us.

For more information about copyright, check out this undergraduate journal library guide, this Author’s rights guide,  or contact our copyright librarian, Sara Benson.


Bailey, Jonathan. (2010). 5 Things that Can’t Be Copyrighted. Plagiarism Today.  Retrieved from https://www.plagiarismtoday.com/2010/01/08/5-things-that-cant-be-copyrighted/

Bailey, Jonathan. (2015). 5 Great People Who Plagiarized. Plagiarism Today.  Retrieved from https://www.plagiarismtoday.com/2015/02/10/5-great-people-who-plagiarized/

New Media Rights. (2011). II. What Can and Can’t Be Copyrighted? New Media Rights. Retrieved from https://www.newmediarights.org/business_models/artist/ii_what_can_and_can’t_be_copyrighted

Post, David. (2017). No Monkey Business Here: The Monkey Copyright Case is Over–For Now. Washington Post. Retrieved from https://www.washingtonpost.com/news/volokh-conspiracy/wp/2017/09/17/no-monkey-business-here-the-monkey-selfie-copyright-case-is-over-for-now/?utm_term=.1624b07a5524

Facebook Twitter Delicious Email

Open Access and… Animals?

Image of a blue and white bird flying over a lake with mountains in the background.

Source: Pixabay.

This post was guest authored by Scholarly Communication and Publishing Graduate Assistant Paige Kuester.

The modern research landscape is an asset for biologists, zoologists, conservationists, etc. They can track animals, check up on them, figure out what is helping or harming their environment, and report or adjust accordingly. They tag animals and create twitter handles for them to tweet out their location (source). They can also create crowdsourcing research methods in order to utilize the interest of the public. And with open access, researchers can easily pass this information on to the public, so that they can create even more awareness and participation, too.

Great, right?

Maybe not. Think about who else has access to that information.

Poachers. Yes, we are still living in an age of poachers. This isn’t just your Tarzan poachers tromping through the jungle, though there is still some of that. This is much more threatening.

Poachers don’t have to track animals anymore, because scientists are doing that for them. Poachers can just gather data posted online through open access sources, and plan out their trip. Crowd-sourced research and tourists apps can also provide this information. If poachers are really nifty, they can tap into radio signals and the like that are sending out locations from the animal tags to the researchers.

One way that researchers can combat this is to not post such specific locations and data on animals that are likely to be poached, especially when publishing with an open access journal. Those in charge of apps can not make information about endangered species publicly available. It is a little more difficult to deter signal hackers, but monitoring and adding more security to these is one way to curb this unfortunate trend.

Open access is great, spreading information about awesome and endangered animals is great, but leaving them vulnerable to exploitation is not so much. It is a bit like Facebook. Sharing your location and your Friday night plans may be fine when you know it is just your friends seeing this information, but when making it public, maybe don’t advertise that you are going to out of your apartment for weeks on end, leaving your valuables alone and unmonitored. While animal privacy rights are not yet a thing, a little courtesy can go a long way in protecting those who don’t have a say.


Hewitt, Sarah. (2017, June 5). Scientists Are Debating Whether Animals have a Right to Privacy. Motherboard. Retrieved from: https://motherboard.vice.com/en_us/article/43ydkb/animals-privacy-tracking-data-science-journals-open-access-banff-national-park

Scheele, Benjamin, and David Lindenmayer. (2017, May 25). Scientists Are Accidently Helping Poachers Drive Endangered Species to Extinction. The Conversation. Retrieved from https://theconversation.com/scientists-are-accidentally-helping-poachers-drive-rare-species-to-extinction-78342

Welz, Adam. (2017, September 6). Unnatural Surveillance: How Online Data is Putting Species at Risk. Yale Environment 360. Retrieved from http://e360.yale.edu/features/unnatural-surveillance-how-online-data-is-putting-species-at-risk

Facebook Twitter Delicious Email