A Different Kind of Data Cleaning: Making Your Data Visualizations Accessible

Introduction: Why Does Accessibility Matter?

Data visualizations are a fast and effective manner for communicating information and are increasingly becoming a more popular way for researchers to share their data with a broad audience. Because of this rising importance, it is also necessary to ensure that data visualizations are accessible to everyone. Accessible data visualizations not only help an audience who may require a screen reader or other accessible tool to read a document but are also helpful to the creators of the data visualization as it brings their data to a much wider audience than through a non-accessible data visualization. This post will offer three tips on how you can make your visualization accessible!

TIP #1: Color Selection

One of the most important choices when making a data visualization are the colors used in the chart. One suggestion would be to use a color blindness simulator to check the colors in the data visualization and experiment to find the right amount of contrast between colors. Look at the example regarding the top ice cream flavors:

A data visualization about the top flavors of ice cream. Chocolate was the top flavor (40%) followed by Vanilla (30%), Strawberry (20%), and Other (10%).

At first glance, these colors may seem acceptable to use for this kind of data. But when ran through the colorblindness simulator, one of the results creates an accessibility concern:

This is the same pie chart above, but placed under a tritanopia color blindness lens. The colors used for strawberry and vanilla now look the exact same and blend into one another because of this, making it harder to discern the amount of space they take in the pie chart.

Although the colors contrasted well enough in the normal view, the color palettes used for the strawberry and vanilla categories look the same for those with tritanopia color blindness. The result is that these sections blend into one another and make it more difficult to distinguish their values. Most color palettes incorporated in current data visualization software are already designed to ensure the colors do not contrast, but it is still a good practice to check to ensure the colors do not blend in with one another!

TIP #2: Adding Alt Text

Since most data visualizations often appear as images in either published work or reports, alt text is a crucial need for accessibility purposes. Take the visualization below. If there was no alt text provided, then the visualization is meaningless to those who rely on alt text to read a given document. Alt text should be short and summarize the key takeaways from the data (there is no need to describe each individual point, but it should provide enough information to describe the trends occurring in the data).

This is a chart showing the population size of each town in a given county. Towns are labeled A-E and continue to grow in population size as they go down the alphabet (town A has 1,000 people while town E has 100,000 people).

TIP #3: Clearly Labeling Your Data

A simple but crucial component of any visualization is having clear labels on your data. Let’s look at two examples to see what makes having labels a vital aspect of any data visualization:

This is a chart for how much money was earned/spent at a lemonade stand by month. There is no y-axis labels to describe how much money is earned/spent and no key to discern the two lines that represent the money made and the money spent.

There is nothing in this graph that provides any useful information regarding the money earned or spent at the lemonade stand. How much money was earned or spent each month? What do these two lines represent? Now, look at a more clearly labeled version of the same data:

This is a cleaned version of the previous visualization regarding how much money was earned/spent at a lemonade stand. The addition of a Y-axis and key now show that more money was spent in January/February than earned, but then changes in March peaking in July, and then continuing to fall until December where more money is spent than earned again.

In adding a labeled Y-axis, we can now quantify the difference in distance between the two lines at any point and have a better idea of the money earned/spent in any given month. Furthermore, the addition of a key at the bottom of the visualization distinguishes the lines telling the audience what each represents. By clearly labeling the data, it is now in a position where audience members can interpret and analyze it properly.

Conclusion: Can My Data Still be Visually Appealing?

While it may appear that some of these recommendations detract from the creative designs of data visualizations, this is not the case at all. Designing a visually appealing data visualization is another crucial aspect of data visualization and should be heavily considered when creating one. Accessibility concerns, however, should have priority over the visual appeal of the data visualization. That said, accessibility in many respects encourages creativity in the design, as it makes the creator carefully consider how they want to present their data in a way that is both accessible and visually appealing. Thus, accessibility makes for a more creative and transmissive data visualization and will benefit everyone!

Meet Our Graduate Assistants: Ryan Yoakum

In this interview series, we ask our graduate assistants questions for our readers to get to know them better. Our first interview this year is with Ryan Yoakum!

This is a headshot of Ryan Yoakum.

What is your background education and work experience?

I came to graduate school directly after receiving my bachelor’s degree in May 2021 in History and Religion here at the University of Illinois. During my undergraduate, I had taken a role working for the University of Illinois Residence Hall Libraries (which was super convenient as I lived in the same building I worked in!) and absolutely loved helping patrons find resources they were interested in. I eventually took a second position with them as a processing assistant, which gave me a taste for working on the back end as I primarily prepared materials bought to be shelved at each of the libraries within the system. I really loved my work with the Residence Hall Libraries and wanted to shift my career to working in a library of some form, which has led me here today!

What are your favorite projects you’ve worked on?

I have really enjoyed projects where I have gotten to work with data (both for patrons as well as internal data). Such projects have allowed me to explore my growing interest in data science (which is the last thing I would have initially expected when I began the master’s program in August 2021). I have also really enjoyed teaching some of the Savvy Researcher workshops, which have included ones on optical character recognition (OCR) and creative commons licensing!

What are some of your favorite underutilized Scholarly Commons resources that you would
recommend?

The two that come to mind are the software on our lab computers as well as our consultation services. If I were still in history, using ABBYY FineReader for OCR would have been a tremendous help as well as supplementing that with qualitative data analysis tools such as ATLAS.ti. I also appreciate the expertise of the many talented people who work here in the library. Carissa Phillips and Sandi Caldrone, for example, have been very influential in helping me explore my interests in data. Likewise, Wenjie Wang, JP Goguen, and Jess Hagman (all of whom now have drop-in consultation hours) have all guided me in working with software related to their specific interests, and I have benefitted greatly by bringing my questions to each of them.

When you graduate, what would your ideal job position look like?

I currently have two competing job interests in mind. The first is that I would love to work in a theological library. The theological library could be either in a seminary or an academic library focusing on religious studies. Pursuing the MSLIS has also shifted my interests in working with data, so I would also love to work a job where I can manage, analyze, and visualize data!

What is the one thing you would want people to know about your field?

Library and Information science is not a field limited to working in the stereotypical way society pictures what a librarian’s work looks like (there was a good satirical article recently on this). It is also far from being a dead field (and one that will likely gain more relevance over time). As part of the program, I am slowly gaining skills that have prepared me for working in data which can apply in any field. There are so many job opportunities for MSLIS students that I strongly encourage people to join the field if they are interested in library and information science but have doubts about its career prospects!

Going Down the Jane Austen Rabbit Hole

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Heidi Imker, Director of the Library Research Data Service

When you think of data, your mind probably doesn’t jump right to Pride and Prejudice. That is, unless you’re Heidi Imker, Director of the Research Data Service and amateur Jane Austen internet sleuth. “In late 2020,” Heidi says, “I was in desperate need of a post-Outlander spiritual cleanse. Naturally, I turned to Pride and Prejudice. Over a year later, I’m still in the midst of a fantastic, out-of-control Jane Austin binge, and I’ve got oodles of related resources worthy of Love Data Week.”

Join Heidi on a virtual tour of some of her favorite data resources about Austen, her works, and historical England.

  1. janeaustenr: Jane Austen’s Complete Novels

In this fabulous R package, data scientist Julia Silge used text data for the Austen novels available from the also fabulous Project Gutenberg. The package offers cleaned data, documentation, and scripts to play with and analyze the novels.

  1. Word Frequencies in English-Language Literature, 1700-1922

Randomly, sifting through the janeaustenr dataset gave me a new level of appreciation for the word “ignore.” Austen didn’t use “ignore” once in any of her novels. It turns out that no one was really using it because it hadn’t caught on yet. In fact, according to Google’s ngram viewer, “ignore” didn’t start getting traction until circa 1845. And now you might be thinking word frequency data is fun, and it is! Like this word frequencies dataset available from the HathiTrust Research Center.

  1. Napoleon Series

One of the things I learned during this binge was that dating the events in Pride and Prejudice has been a subject of debate for some time (as in, about a century). I found it downright fascinating that scholars could map parts of the book to the 1811 calendar year and others to the year 1794. I had never really thought about the characters existing in a specific year, but now I wondered what else was happening in those years? I discovered the Waterloo Association, a community of military historians behind the Napoleon Series. This immense archive contains articles on military history, biographies, and documentation of thousands of officers and soldiers (such as Challis’s Peninsula Roll Call).

  1. London Lives

Provides searchable access to >240,000 digitized pages of archival documents, with special focus on crime, poverty, and social policy. Not only is the source material available, but the people behind London Lives have made it a point to keep humanity at the forefront by constructing biographies of the individuals caught in the crime and poverty cycle in London between 1690 and 1800.

  1. Calendar of London Concerts 1750-1800

My favorite dataset of all time, it was thoughtfully and painstakingly created by Professor Simon McVeigh at Goldsmiths, University of London over many decades. It lists 4,001 concert events, as found through locating and documenting adverts in archival newspapers—by hand. When Lady Catharine tells Elizabeth that “it will be in my power to take one of you as far as London, for I am going there early in June, for a week,” what could that self-professed music aficionado have heard in June 1794? Voila! Perhaps it was Handel’s Messiah at St Margaret’s Church in Westminster on Thursday, June 5th.

I appreciate the Calendar of London Concerts dataset for my odd little hobby, but I love it as an information professional. The sheer dedication it took assemble the data, especially with such strict attention to detail, is incredible. Let me explicitly gush about the documentation for a moment. Context! References! Abbreviations! All explained! What’s “HM”? His Majesty’s something or other? No, it’s the Half-Moon Tavern in Cheapside. Currency conversions! Syntax for nearly impossible to standardize programme content! It’s forty-four glorious pages! Swoon!

Related resources on London concerts

What started out as a casual, online-friendly hobby ended up introducing me to a wealth of enlightening open data resources, and I’m in love with every one of them. Since my Austen binge is apparently nowhere near over, you may well get another link-laden post for next year’s Love Data Week. <3

Headshot of HeidiHeidi Imker is the Director of the Research Data Service (RDS) and an Associate Professor at the University of Illinois at Urbana-Champaign. The RDS helps researchers across the Urbana-Champaign campus manage and share research data, and in her role as Director, she ensures the RDS takes a collaborative, user-oriented, and practical approach to research support. Heidi holds a Ph.D. in Biochemistry from the University of Illinois and did her postdoctoral research at the Harvard Medical School.

OpenRefine: a Cinderella Story but for Data

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Dena Strong

Ever wish you could call on a fairy godmother who could wave a magic wand and make all your data problems disappear? Luckily for us at the University of Illinois, we can call on Senior Information Design Specialist Dena Strong. Dena can solve data problems so fast it seems downright magical. For Love Data Week 2022, check out Dena’s story about OpenRefine, the data tool she loves beyond all reason:

“I once had a consultation with a person who presented me with two Excel files and a data cleaning dilemma that he estimated was going to take him 200 hours of manual labor to repair. It took me 15 minutes of conversation to understand what he needed to do with the files to get them clean and integrated – and then it took me 5 minutes in OpenRefine to do the data cleaning and teach him how to do the same so he could do it again whenever he wanted. The other 199.6 hours of his time went to more productive uses. He and I have both been OpenRefine cheerleaders ever since. When I did a Caffeine Break session about it, an attendee said it was the most useful 45 minutes of training he’d ever had.”

As of the time of this writing, none of Dena’s datasets have turned back into pumpkins.

Headshot of DenaDena Strong (MLIS) is a member of the Web Hosting team at Technology Services; she also serves as a liaison with the Research Data Service at the Library. With 20 years of experience in usability, accessibility, information architecture, and workflows, Dena enjoys collaborating and consulting with people across campus. She’s also been spotted studying six languages, reproducing Heian-era Japanese dye techniques, and occasionally burning Kool-aid in search of new fabric colors.

When did you first fall in love with data?

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Lauren Phegley

Picture it – North Central College, Illinois, 2018. Twenty-one-year-old sociology major Laurent Phegley takes her seat in Professor Corsino’s class with no idea that she’s about to fall in love…with data. At the time, Dr. Corsino studied occupational attainment of Italian immigrants in Chicago Heights during the 1900’s. Lauren and her classmates sifted through census data to piece together the career tracks of (mostly male) Italian Americans. These data weren’t just checkmarks on a form. They were glimpses into entire families, glimpses that when pieced together told a story about how the American dream operates on the basis of social class. “For me, tracking the individuals through the census was a large puzzle,” Lauren says. Since then, Lauren has focused on helping other researchers solve their data puzzles. “Social science students are often not taught about data management because they don’t see their research as relating to ‘data’. I make a concerted effort now in my work and teaching to target fields that are often forgot about in terms of data management. Research is a labor of love. It is well worth a few hours of time to make sure that your data stays useable and understandable!”

Headshot of LaurenLauren Phegley is a graduate assistant for the Library Research Data Service pursuing her Masters of Science in Library and Information Science at the University of Illinois iSchool. Once she graduates in May 2022, she hopes to work as an academic librarian helping researchers manage their data and research.