A Different Kind of Data Cleaning: Making Your Data Visualizations Accessible

Introduction: Why Does Accessibility Matter?

Data visualizations are a fast and effective manner for communicating information and are increasingly becoming a more popular way for researchers to share their data with a broad audience. Because of this rising importance, it is also necessary to ensure that data visualizations are accessible to everyone. Accessible data visualizations not only help an audience who may require a screen reader or other accessible tool to read a document but are also helpful to the creators of the data visualization as it brings their data to a much wider audience than through a non-accessible data visualization. This post will offer three tips on how you can make your visualization accessible!

TIP #1: Color Selection

One of the most important choices when making a data visualization are the colors used in the chart. One suggestion would be to use a color blindness simulator to check the colors in the data visualization and experiment to find the right amount of contrast between colors. Look at the example regarding the top ice cream flavors:

A data visualization about the top flavors of ice cream. Chocolate was the top flavor (40%) followed by Vanilla (30%), Strawberry (20%), and Other (10%).

At first glance, these colors may seem acceptable to use for this kind of data. But when ran through the colorblindness simulator, one of the results creates an accessibility concern:

This is the same pie chart above, but placed under a tritanopia color blindness lens. The colors used for strawberry and vanilla now look the exact same and blend into one another because of this, making it harder to discern the amount of space they take in the pie chart.

Although the colors contrasted well enough in the normal view, the color palettes used for the strawberry and vanilla categories look the same for those with tritanopia color blindness. The result is that these sections blend into one another and make it more difficult to distinguish their values. Most color palettes incorporated in current data visualization software are already designed to ensure the colors do not contrast, but it is still a good practice to check to ensure the colors do not blend in with one another!

TIP #2: Adding Alt Text

Since most data visualizations often appear as images in either published work or reports, alt text is a crucial need for accessibility purposes. Take the visualization below. If there was no alt text provided, then the visualization is meaningless to those who rely on alt text to read a given document. Alt text should be short and summarize the key takeaways from the data (there is no need to describe each individual point, but it should provide enough information to describe the trends occurring in the data).

This is a chart showing the population size of each town in a given county. Towns are labeled A-E and continue to grow in population size as they go down the alphabet (town A has 1,000 people while town E has 100,000 people).

TIP #3: Clearly Labeling Your Data

A simple but crucial component of any visualization is having clear labels on your data. Let’s look at two examples to see what makes having labels a vital aspect of any data visualization:

This is a chart for how much money was earned/spent at a lemonade stand by month. There is no y-axis labels to describe how much money is earned/spent and no key to discern the two lines that represent the money made and the money spent.

There is nothing in this graph that provides any useful information regarding the money earned or spent at the lemonade stand. How much money was earned or spent each month? What do these two lines represent? Now, look at a more clearly labeled version of the same data:

This is a cleaned version of the previous visualization regarding how much money was earned/spent at a lemonade stand. The addition of a Y-axis and key now show that more money was spent in January/February than earned, but then changes in March peaking in July, and then continuing to fall until December where more money is spent than earned again.

In adding a labeled Y-axis, we can now quantify the difference in distance between the two lines at any point and have a better idea of the money earned/spent in any given month. Furthermore, the addition of a key at the bottom of the visualization distinguishes the lines telling the audience what each represents. By clearly labeling the data, it is now in a position where audience members can interpret and analyze it properly.

Conclusion: Can My Data Still be Visually Appealing?

While it may appear that some of these recommendations detract from the creative designs of data visualizations, this is not the case at all. Designing a visually appealing data visualization is another crucial aspect of data visualization and should be heavily considered when creating one. Accessibility concerns, however, should have priority over the visual appeal of the data visualization. That said, accessibility in many respects encourages creativity in the design, as it makes the creator carefully consider how they want to present their data in a way that is both accessible and visually appealing. Thus, accessibility makes for a more creative and transmissive data visualization and will benefit everyone!

Halloween Data Visualizations!

It’s that time of year where everyone starts to enjoy all things spooky and scary – haunted houses, pumpkin picking, scary movies and…data visualizations! To celebrate Halloween, we have created a couple of data visualizations from a bunch of data sets. We hope you enjoy them!

Halloween Costumes

How do you decide what Halloween costume you wear? Halloween Costumes conducted a survey on this very topic. According to their data, the top way people choose their costume is based on what is easiest to make. Other inspirations include classic costumes, coordination with others, social media trends, and characters from either recent or classic movie or tv franchises.

Data on how people choose their Halloween Costumes. 39% of people base it on the easiest costume they can find, 21% on classic costumes (such as ghosts, witches, etc.), 14% on recent TV or movie characters, another 14% on couples/group/family coordination, 12% on older TV or movie characters, and 11% on social media trends.

The National Retail Federation also conducted a survey of the top costumes that adults were expected to wear in 2019 (there were no good data sets for 2020…). According to the survey, the most popular Halloween costume that year was a witch. Other classic costumes, such as vampires, zombies, and ghosts, ranked high too. Superheroes were also a popular costume choice, with many people dressing up as Spider-man or another Avengers character.

 

Data on the top 10 costumes of 2019. The top choice was dressing up as a witch, followed by a vampire, superhero, pirate, zombie, ghost, avengers character, princess, cat, and Spider-man.

 

Halloween Spending and Production

According to the National Retail Federation, Halloween spending has significantly increased between 2005 to this year, with the expected spending this year surpassing 10 billion dollars! That is up from fifteen years ago when the estimated Halloween spending averaged around 5 billion dollars.

 

This is data on expected Halloween spending between 2005 and 2021. In 2005, the expected spending was 3.3 Billion dollars. In 2006, it was 5 billion dollars. In 2007, it was 5.1 billion dollars. In 2008, it was 5.8 billion dollars. In 2009, it was 4.7 billion dollars. In 2010, it was 5.8 billion dollars again. In 2011, it was 6.9 billion dollars. In 2012, it was 8 billion dollars. In 2013, it was 7 billion dollars. In 2014, it was 7.4 billion dollars. In 2015, it was 6.9 billion dollars. In 2016, it was 8.4 billion dollars. In 2017, it was 9.1 billion dollars. In 2018, it was 9 billion dollars. In 2020, it was 8 billion dollars. Finally, in 2021, it is expected to be 10.1 billion dollars.

With much spending invested in Halloween, it would make sense that the production of Halloween-related items would likely grow too to meet this demand. The U.S. Department of Agriculture records each year the number of pumpkins produced in the United States. Besides one dip taken in 2015, it appears that pumpkin production has almost doubled in the past twenty years on average.

 

This is data on the number of pumpkins produced in the United States every year. In 2001, it was 8,460,000 pumpkins produced. In 2002, 8,509,000 Pumpkins were produced. In 2003, 8,085,000 pumpkins were produced. In 2004, 10,135,000 pumpkins were produced. In 2005, 10,756,000 pumpkins were produced. In 2006, 10,484,000 pumpkins were produced, in 2007, 11,458,000 pumpkins were produced. In 2008, 10,663,000 pumpkins were prodcued. In 2009, 9,311,000 pumpkins were produced. In 2010, 10,748,000 pumpkins were produced. In 2011, 10,705,000 pumpkins were produced. In 2012, 12,036,000 pumpkins were produced. In 2013, 11,221,000 pumpkins were prodcued. In 2014m 13,143,000 pumpkins were produced. In 2015, 7,538,000 pumpkins were prodcued. In 2016, 17,096,500 pumpkins were produced. In 2017, 15,600,600 pumpkins were produced. In 2018, 15,406,900 pumpkins were produced. In 2019, 13,450,900 pumpkins were produced. Finally, in 2020,, 13,751,500 pumpkins were produced.

Halloween Activities by Demographics

Finally, here are two statistics taken from the National Retail Federation again regarding how people celebrate activities based on age and region. As the data shows, younger people seem more likely to dress in costumes, visit haunted houses, or throw parties on Halloween. Meanwhile, older individuals are more likely to decorate their homes or hand out candy.

This is data about how people celebrate different Halloween activities by age. Those 65 and older are only 31% likely to carve a pumpkin (31%) as opposed to the 43-50% likelihood of other age groups. Those 55-64 are the most likely to decorate their homes/yard (58%) while 18-24 are the least likely (47%). Those 18-24 years old, however, are the most likely to dress in costume (69%) while only 18% of those 65 and older will dress in costumes. Those 25-34 are the most likely to dress their pets up at 30% with only 8% of those 65 and older doing the same. Those 65 and older are 81% likely to hand out candy, however, while only 51% of people 18-24 years of age will pass out candy. Those at ages 35-44 are 38% likely to take their children trick-or-treating, while only 13% of those 65 and older do so. The 18-24 year old demographic are the most likely to throw or attend a party (43%), while 11% of those 65 and older do the same. Similarly, 18-24 demographic are the most likely to attend a haunted house at 32% while only 3% of those in the 65 and older range do the same.

At the same time, there seems to be not too huge of a difference in celebrating by region, apart from those living on the west coast being more likely to dress up or those living in the northeast more likely to hand out candy. Other than those two differences, it seems that most regions celebrate the same Halloween activities in the same proportions.

This is data about how people celebrate different Halloween activities by region. 42-46% of people carve a pumpkin (with those in the Midwest on the higher end and the South on the lower end). 50-54% of people decorate their home or yard with the Midwest and Northeast on the higher end and the South on the lower end. 41-52% of people dress in costume with those living in the West on the higher end and the Midwest on the lower end. 19-22% of people dress their pets with those living in the West on the higher end and the Midwest on the lower end. 64-70% of people hand out candy with the Northeast on the higher end and the West and South tied on the lower end. 22-26% of people take their children trick-or treating with those living in the Midwest and South on the higher end and the West on the lower end. 25% of people throw or attend a party equally across regions. 17-19% of people visit a haunted house with the Midwest and South on the higher end and the West on the lower end.

 

We hope these data visualizations got you in the mood for spooky, Halloween fun! From all of us at the Scholarly Commons, Happy Halloween!

Introductions: What is Digital Scholarship, anyways?

This is the beginning of a new series where we introduce you to the various topics that we cover in the Scholarly Commons. Maybe you’re new to the field or you’re just to the point where you’re just too afraid to ask… Fear not! We are here to take it back to the basics!

What is digital scholarship, anyways?

Digital scholarship is an all-encompassing term and it can be used very broadly. Digital scholarship refers to the use of digital tools, methods, evidence, or any other digital materials to complete a scholarly project. So, if you are using digital means to construct, analyze, or present your research, you’re doing digital scholarship!

It seems really basic to say that digital scholarship is any project that uses digital means because nowadays, isn’t that every project? Yes and No. We use the term digital quite liberally…If you used Microsoft Word to just write your essay about a lab you did during class – that is not digital scholarship however if you used specialized software to analyze the results from a survey you used to gather data then you wrote about it in an essay that you then typed in Microsoft Word, then that is digital scholarship! If you then wanted to get this essay published and hosted in an online repository so that other researchers can find your essay, then that is digital scholarship too!

Many higher education institutions have digital scholarship centers at their campus that focus on providing specialized support for these types of projects. The Scholarly Commons is a digital scholarship space in the University Main Library! Digital scholarship centers are often pushing for new and innovative means of discovery. They have access to specialized software and hardware and provide a space for collaboration and consultations with subject experts that can help you achieve your project goals.

At the Scholarly Commons, we support a wide array of topics that support digital and data-driven scholarship that this series will cover in the future. We have established partners throughout the library and across the wider University campus to support students, staff, and faculty in their digital scholarship endeavors.

Here is a list of the digital scholarship service points we support:

You can find a list of all the software the Scholarly Commons has to support digital scholarship here and a list of the Scholarly Commons hardware here. If you’re interested in learning more about the foundations of digital scholarship follow along to our Introductions series as we got back to the basics.

As always, if you’re interested in learning more about digital scholarship and how to  support your own projects you can fill out a consultation request form, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Simple NetInt: A New Data Visualization Tool from Illinois Assistant Professor, Juan Salamanca

Juan Salamanca Ph.D, Assistant Professor in the School of Art and Design at the University of Illinois Urbana-Champaign recently created a new data visualization tool called Simple NetInt. Though developed from a tool he created a few years ago, this tool brings entirely new opportunities to digital scholarship! This week we had the chance to talk to Juan about this new tool in data visualization. Here’s what he said…

Simple NetInt is a JavaScript version of NetInt, a Java-based node-link visualization prototype designed to support the visual discovery of patterns across large dataset by displaying disjoint clusters of vertices that could be filtered, zoomed in or drilled down interactively. The visualization strategy used in Simple NetInt is to place clustered nodes in independent 3D spaces and draw links between nodes across multiple spaces. The result is a simple graphic user interface that enables visual depth as an intuitive dimension for data exploration.

Simple NetInt InterfaceCheck out the Simple NetInt tool here!

In collaboration with Professor Eric Benson, Salamanca tested a prototype of Simple NetInt with a dataset about academic publications, episodes, and story locations of the Sci-Fi TV series Firefly. The tool shows a network of research relationships between these three sets of entities similar to a citation map but on a timeline following the episodes chronology.

What inspired you to create this new tool?

This tool is an extension of a prototype I built five years ago for the visualization of financial transactions between bank clients. It is a software to visualize networks based on the representation of entities and their relationships and nodes and edges. This new version is used for the visualization of a totally different dataset:  scholarly work published in papers, episodes of a TV Series, and the narrative of the series itself. So, the network representation portrays relationships between journal articles, episode scripts, and fictional characters. I am also using it to design a large mural for the Siebel Center for Design.

What are your hopes for the future use of this project?

The final goal of this project is to develop an augmented reality visualization of networks to be used in the field of digital humanities. This proof of concept shows that scholars in the humanities come across datasets with different dimensional systems that might not be compatible across them. For instance, a timeline of scholarly publications may encompass 10 or 15 years, but the content of what is been discussed in that body of work may encompass centuries of history. Therefore, these two different temporal dimensions need to be represented in such a way that helps scholars in their interpretations. I believe that an immersive visualization may drive new questions for researchers or convey new findings to the public.

What were the major challenges that came with creating this tool?

The major challenge was to find a way to represent three different systems of coordinates in the same space. The tool has a universal space that contains relative subspaces for each dataset loaded. So, the nodes instantiated from each dataset are positioned in their own coordinate system, which could be a timeline, a position relative to a map, or just clusters by proximities. But the edges that connect nodes jump from one coordinate system to the other. This creates the idea of a system of nested spaces that works well with few subspaces, but I am still figuring out what is the most intuitive way to navigate larger multidimensional spaces.

What are your own research interests and how does this project support those?

My research focuses on understanding how designed artifacts affect the viscosity of social action. What I do is to investigate how the design of artifacts facilitates or hinders the cooperation of collaboration between people. I use visual analytics methods to conduct my research so the analysis of networks is an essential tool. I have built several custom-made tools for the observation of the interaction between people and things, and this is one of them.

If you would like to learn more about Simple NetInt you can find contact information for Professor Juan Salamanca here and more information on his research!

If you’re interested in learning more about data visualizations for your own projects, check out our guide on visualizing your data, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Holiday Data Visualizations

The fall 2020 semester is almost over, which means that it is the holiday season again! We would especially like to wish everyone in the Jewish community a happy first night of Hanukkah tonight.

To celebrate the end of this semester, here are some fun Christmas and Hanukkah-related data visualizations to explore.

Popular Christmas Songs

First up, in 2018 data journalist Jon Keegan analyzed a dataset of 122 hours of airtime from a New York radio station in early December. He was particularly interested in discovering if there was a particular “golden age” of Christmas music, since nowadays it seems that most artists who release Christmas albums simply cover the same popular songs instead of writing a new song. This is a graph of what he discovered:

Based on this dataset, 65% of popular Christmas songs were originally released in the 1940s, 50s, and 60s. Despite the notable exception of Mariah Carey’s “All I Want for Christmas is You” from the 90s, most of the beloved “Holiday Hits” come from the mid-20th century.

As for why this is the case, the popular webcomic XKCD claims that every year American culture tries to “carefully recreate the Christmases of Baby Boomers’ childhoods.” Regardless of whether Christmas music reflects the enduring impact of the postwar generation on America, Keegan’s dataset is available online to download for further exploration.

Christmas Trees

Last year, Washington Post reporters Tim Meko and Lauren Tierney wrote an article about where Americans get their live Christmas trees from. The article includes this map:

The green areas are forests primarily composed of evergreen Christmas trees, and purple dots represent Choose-and-cut Christmas tree farms. 98% of Christmas trees in America are grown on farms, whether it’s a choose-and-cut farm where Americans come to select themselves or a farm that ships trees to stores and lots.

This next map shows which counties produce the most Christmas trees:

As you can see, the biggest Christmas tree producing areas are New England, the Appalachians, the Upper Midwest, and the Pacific Northwest, though there are farms throughout the country.

The First Night of Hanukkah

This year, Hanukkah starts tonight, December 10, but its start date varies every year. However, this is not the case on the primarily lunar-based Hebrew Calendar, in which Hanukkah starts on the 25th night of the month of Kislev. As a result, the days of Hanukkah vary year-to-year on other calendars, particularly the solar-based Gregorian calendar. It can occur as early as November 28 and as late as December 26.

In 2016, Hannukah began on December 24, Christmas Eve, so Vox author Zachary Crockett created this graphic to show the varying dates on which the first night of Hannukah has taken place from 1900 to 2016:

The Spelling of Hanukkah

Hanukkah is a Hebrew word, so as a result there is no definitive spelling of the word in the Latin alphabet I am using to write this blog post. In Hebrew it is written as חנוכה and pronounced hɑːnəkə in the phonetic alphabet.

According to Encyclopædia Britannica, when transliterating the pronounced word into English writing, the first letter ח, for example, is pronounced like the ch in loch. As a result, 17th century transliterations spell the holiday as Chanukah. However, ח does not sounds like the way ch does when its at the start of an English word, such as in chew, so in the 18th century the spelling Hanukkah became common. However, the H on its own is not quite correct either. More than twenty other spelling variations have been recorded due to various other transliteration issues.

It’s become pretty common to use Google Trends to discover which spellings are most common, and various journalists have explored this in past years. Here is the most recent Google search data comparing the two most commons spellings, Hanukkah and Chanukah going back to 2004:

You can also click this link if you are reading this article after December 2020 and want even more recent data.

As you would expect, the terms are more common every December. It warrants further analysis, but it appears that Chanukah is becoming less common in favor of Hanukkah, possibly reflecting some standardization going on. At some point, the latter may be considered the standard term.

You can also use Google Trends to see what the data looks like for Google searches in Israel:

Again, here is a link to see the most recent version of this data.

In Israel, it also appears as though the Hanukkah spelling is also becoming increasingly common, though early on there were years in which Chanukah was the more popular spelling.


I hope you’ve enjoyed seeing these brief explorations into data analysis related to Christmas and Hanukkah and the quick discoveries we made with them. But more importantly, I hope you have a happy and relaxing holiday season!