Exploring Data Visualization #14

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A day in the life of Americans: a data comic

A comic demonstrating the amount of time Americans spend sleeping, at work, free, or doing other activities from 4 a.m. to 3 p.m.

By illustrating the activity most Americans are doing at a given hour, Hong highlights what the average day looks like for an American worker.

Happy May, researchers! With the semester winding down and summer plans on the horizon, a lot of us are reflecting on what we’ve done in the past year. Sometimes it can be hard to determine what your daily routine looks like when you are the one doing it every day. Matt Hong created a cute and informative data comic about how we spend our time during the day, based on data from the Census Bureau. Check out Hong’s Medium page for more data comics.

What Qualifies as Middle-Income in Each State

A bar chart that shows the range of incomes that qualify as "middle-income" for households made up of four people, organized by state.

The distribution of middle-income for households made up of four people.

Nathan Yau at Flowing Data created an interesting chart that shows the range of income that is considered “middle-income” in each state and the District of Columbia in the United States. The design of the chart itself is smooth and watching the transitions between income ranges based on number of people in the household is very enjoyable. It is also enlightening to see where states fall on the spectrum of what “middle-income” means, and this visualization could be a useful tool for researchers working on wage disparity.

When People Find a New Job

A frequency trail chart that shows peaks based on the age when people change jobs.

The bottom of the chart shows jobs that people transition into later in life.

The end of the semester also means a wave of new graduates entering the workforce. While we extend our congratulations to those people, we often inquire about what their upcoming plans are and where they will be working in the future. For some, that question is straightforward; for others, a change of pace may be on the horizon. Nathan Yau of Flowing Data also created a frequency trail chart that shows at what age many people change career paths. As Yau demonstrates in a bar chart that accompanies the frequency trail chart, the majority of job switches happen early and late in life, a phenomenon which he offers some suggestions for.

A bar chart showing the distribution of the age at which people switch jobs. 15-19 is the highest percent (above 30%) and 55-64 is the lowest (around 10%)

The peak at the “older” end of the chart indicates some changes post-retirement, but also makes you wonder why people are still finding new jobs at age 85 to 89.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Exploring Data Visualization #13

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

One Way to Spot a Partisan Gerrymander

Even though it feels like it was 2016 yesterday, we are more than a quarter of the way through 2019 and the 2020 political cycle is starting to heat up. A common issue in the minds of voters and politicians is fraudulent and rigged elections—voters increasingly wonder if their votes really matter in the current political landscape. Last week, the Supreme Court heard two cases on partisan gerrymandering in North Carolina and Maryland. FiveThirtyEight made an elegant visualization about gerrymandering in North Carolina. The visualization demonstrates how actual election outcomes can be used to extrapolate what percentage of seats will go to each party.

A graph that shows the Average Democratic vote share in the U.S. House plotted against the actual outcome. A pink line represents the average outcome and since it does not pass through (0, 0), that indicates partisan bias in the House election being studied.

If there is no partisan bias in voting districts, the outcome should be 50/50.

As you scroll, the chart continues to develop and become more complicated. It adds results from past elections to contextualize the severity of the current problems with gerrymandering. It also provides an example of the outcomes of a redrawn district map in Pennsylvania.

Mistakes, we’ve drawn a few

Two different charts that both represent attitudes in the UK toward Britain voting to leave the EU. The chart on the left is a sine chart which looks erratic while the chart on the right shows the averages of plotted lines and demonstrates clear trends.

The change from line chart to plotted points better demonstrates the trend of the attitudes toward Brexit.

Sarah Leo from The Economist re-creates past visualizations from the publication that were misleading or poorly designed. The blog post calls out the mistakes made very effectively and offers redesigns, when possible. They also make their data available after each visualization.

Seeing two visualizations of the same data next to one another really helps drive home how data can be represented differently–and how that causes different impacts upon a reader.


The Financial Times has made an online version of their quick chart-making tool available for the public. Appropriately titled FastCharts, the site lets you upload your own data or play around with sample data they have provided. Because this tool is so simple, it seems like it would be useful for exploratory data, but maybe not for creating more complex explanations of your data.

The interface of FastCharts, showing a line chart of global temperature anomalies from 1850 to 2017.

FastCharts automatically selects which type of chart it thinks will work best for your data.

Play with the provided example data or use your own data to produce an interesting result! For a challenge, see if any of the data in our Numeric Data Library Guide can work for this tool.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Exploring Data Visualization #12

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

American segregation, mapped day and night

Is segregation in the United States improving? And if it is, what race sees the most people of different races? And do the answers to these questions change based on the time of day? Vox sets out to answer some of these questions through a video essay and an interactive map about segregation in the United States cities at work and at home.

A map of Champaign County showing data peaks where the highest population of Black people live.

This map shows the population density of Black people living in Champaign-Urbana, IL. The brighter the pink, the higher the percentage of Black people living only near Black people.

A map showing the areas in Champaign County populated by white people.

This map shows the population density of white people living in Champaign-Urbana, IL. The brighter the pink, the higher the percentage of white people living only near white people.

The map is interesting and effectively demonstrates the continued presence of segregation in communities across the United States. However, there is little detail on the map about the geographical features of the region being examined. This isn’t too much of a problem if you are familiar with the region you are looking at, but for more unfamiliar communities it leads to more questions than it answers.

NASA’s Opportunity Rover Dies on Mars


After 15 years on Mars, the Opportunity Rover Mission was officially declared finished on February 13th, 2019. The New York Times created a visualization that lets you follow Opportunity’s 28 mile path across the surface of Mars, which includes a bird’s eye view of Oppy’s path as well as images sent by the rover back to NASA. Opportunity was responsible for discovering evidence of drinkable water on Mars.

A map of the surface of mars with a yellow line showing the path of NASA's Opportunity rover. There is a small image in the corner of Santa Maria Crater taken by the rover.

The map of Opportunity’s path is accompanied by images from the rover and artists’ renderings of the surface of Mars.

The periodic table is a scatterplot. (Among others.)


The periodic table: a data visualization familiar to anyone who has ever set foot in a grade school science classroom. As Lisa Rost points out, the periodic table is actually just a simple scatter plot, with group as the x-axis and period as the y-axis. Or at least, that’s true of the Mendeleev periodic table, the one we are most familiar with. See some other examples of how to break down the periodic table on Rost’s post, which links to the Wikipedia article on alternative periodic tables. If you find a favorite, be sure to tweet it to us @ScholCommons! We are always curious to see what visualizations get people excited.

A visualization of the periodic table of the elements with the elements represented by different colored dots. The dot colors correspond to when in time the elements were discovered, which is coded in a key at the top of the chart. Yellow is before Mendeleev, blue is after Mendeleev, orange is BC, and black is since 2000.

A periodic table color coded by Lisa Rost to show when in time different elements where discovered.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Transformation in Digital Humanities

The opinions presented in this piece are solely the author’s and referenced authors. This is meant to serve as a synthesis of arguments made in DH regarding transformation.

How do data and algorithms affect our lives? How does technology affect our humanity? Scholars and researchers in the digital humanities (DH) ask questions about how we can use DH to enact social change by making observations of the world around us. This kind of work is often called “transformative DH.”

The idea of transformative DH is an ongoing conversation. As Moya Bailey wrote in 2011, scholars’ experiences and identities affect and inform their theories and practices, which allows them to make worthwhile observations in diverse areas of humanities scholarship. Just as there is strong conflict about how DH itself is defined, there is also conflict regarding whether or not DH needs to be “transformed.” The theme of the 2011 Annual DH Conference held at Stanford was “Big Tent Digital Humanities,” a phrase symbolizing the welcoming nature of the DH field as a space for interdisciplinary scholarship. Still, those on the fringes found themselves unwelcome, or at least unacknowledged.

This conversation around what DH is and what it could be exploded at the Modern Languages Association (MLA) Convention in 2011, which featured multiple digital humanities and digital pedagogy sessions aimed at defining the field and what “counts” as DH. During the convention Stephen Ramsay, in a talk boldly title “Who’s In and Who’s Out,” stated that all digital humanists must code in order to be considered a digital humanist (he later softened “code” to “build”). These comments resulted in ongoing conversations online about gatekeeping in DH, which refer to both what work counts as DH and who counts as a DHer or digital humanist. Moya Bailey also noted certain that scholars whose work focused on race, gender, or queerness and relationships with technology were “doing intersectional digital humanities work in all but name.” This work, however, was not acknowledged as digital humanities.


Website Banner from transformdh.org

To address gatekeeping in the DH community more fully, the group #transformDH was formed in 2011, during this intense period of conversation and attempts at defining. The group self-describes as an “academic guerrilla movement” aimed at re-defining DH as a tool for transformative, social justice scholarship. Their primary objective is to create space in the DH world for projects that push beyond traditional humanities research with digital tools. To achieve this, they encourage and create projects that have the ability to enact social change and bring conversations on race, gender, sexuality, and class into both the academy and the public consciousness. An excellent example of this ideology is the Torn Apart/Separados project, a rapid response DH project completed in response to the United States enacting a “Zero Tolerance Policy” for immigrants attempting to cross the US/Mexico border. In order to visualize the reach and resources of ICE (those enforcing this policy), a cohort of scholars, programmers, and data scientists banded together and published this project in a matter of weeks. Projects such as these demonstrate the potential of DH as a tool for transformative scholarship and to enact social change. The potential becomes dangerously disregarded when we set limits on who counts as a digital humanist and what counts as digital humanities work.

For further, in-depth reading on this topic, check out the articles below.

February Push!

Hello, researchers!

Congratulations! You made it through your first month back of the spring semester. From class work, to pouring rain, to enough snow and ice and make the university look like it’s auditioning for a role as Antarctica, you’re pushing forward!

A dual-monitor computer in the Scholarly Commons. The background of the image shows the Scholarly Commons space, which is filled with out dual-monitor computers and various desks.

Take a minute to look over all the awesome resources we have, right here in the Scholarly Commons, to help you keep chugging along with your research.

We are open 8:30 a.m. to 6 p.m., Monday through Friday. Our various, dual monitor computers have software ranging from Adobe Photoshop to OCR which can be paired with our various scanners to make machine readable PDFs!

The Scholarly Commons space. A desk with a computer and a sign reading "Scholarly Commons" is shown.

Researchers can book free consultations thanks to our partnerships with CITL Data Analytics and Technology Services! In these meetings, you can learn about R, SAS, and everything else you need to just get started or to get past that tricky problem in your statistical research.

Beyond that, users can make appoints with our GIS specialist, and learn even more through our GIS resources. We have a ton of great books in our non-circulating reference collection that can help you learn about Python, GIS, and more!

The Scholarly Common reference collection. Six shelves filled with books.


And that’s not all: our Data Analytics & Visualization Librarian has put together a plethora of resources to help turn your data into art. Check out the four most common types of charts guide to get started!

The Scholarly Commons space. it contains several workstations with a carpeted floor.

And even this doesn’t cover all of our services!

If you need assistance finding numeric data, understanding your copyrights, cleaning up data in OpenRefine, or even starting up a project using text mining, we have the resources you need.

The Scholarly Commons has all the resources you need to succeed, so stop by anytime! We’re always happy to help.

Exploring Data Visualization #11

 In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Data Visualization Office Hours and Workshops

A headshot of Megan Ozeran with a border above her reading Data Viz Help and a banner below that reads The Librarian is In

Our amazing Data Visualization Librarian Megan Ozeran is holding open office hours every other Monday for the Spring 2019 semester! Drop by the Scholarly Commons from 2-4 on any of the dates listed below to ask any data viz questions you might have.

Office hours on: February 25, March 11, March 25, April 8, April 22, and May 6.

Additionally, Megan will teach a joint workshop as part of our Savvy Researcher series titled “Network Analysis in Digital Humanities” on Thursday, March 7th. Megan and SC GA Kayla Abner will cover the basics of how to use NodeXL, Palladio, and Cytoscape to show relationships between concepts in your research. Register online on our Savvy Researcher Calendar!

Lifespan of News Stories

A chart showing the search interest for different news stories in October 2018, represented as colored peaks with the apex labeled with a world event.

October was one of the busier times of the year, with eight overlapping news stories. Hurricane Michael tied with Hurricane Florence for the largest number of searches in 2018.

According to trends compiled by the news site Axios, “news cycles for some of the biggest moments of 2018 only lasted for a median of 7 days.” Axios put together a timeline of the year which shows the peaks and valleys of 49 of the top news stories from 2018. A simplified view of the year in the article “What captured America’s attention in 2018” shows the distribution of those 49 stories, while a full site, “The Lifespan of News Stories,” shows search interest by region and links to an article from Axios about the event (clever advertising on their part).

#SWDchallenge: visualize variance

A graph showing the average minimum temperature in Milwaukee, Wisconsin, for January 2000 through January 2019. The points on the chart are connected with light blue lines and filled in with blue to resemble icicles.

Knaflic’s icicle-style design for minimum temperature.

If there were to be a search interest visualization for the past few weeks in the Midwest, I have no doubt that the highest peak would be for the term “polar vortex.” The weather so far this year has been unusual, thanks to the extreme cold due to the polar vortex we had in the last week of January. Cole Nussbaumer Knaflic from Storytelling with Data used the cold snap as inspiration for the #SWDchallenge this month: visualize variance. Knaflic went through a series of visualizations in a blog post to show variation in average temperature in Milwaukee.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Google MyMaps Part II: The Problem with Projections

Back in October, we published a blog post introducing you to Google MyMaps, an easy way to display simple information in map form. Today we’re going to revisit that topic and explore some further ways in which MyMaps can help you visualize different kinds of data!

One of the most basic things that students of geography learn is the problem of projections: the earth is a sphere, and there is no perfect way to translate an image from the surface of a sphere to a flat plane. Nevertheless, cartographers over the years have come up with many projection systems which attempt to do just that, with varying degrees of success. Google Maps (and, by extension, Google MyMaps) uses perhaps the most common of these, the Mercator projectionDespite its ubiquity, the Mercator projection has been criticized for not keeping area uniform across the map. This means that shapes far away from the equator appear to be disproportionately larger in comparison with shapes on the equator.

Luckily, MyMaps provides a method of pulling up the curtain on Mercator’s distortion. The “Draw a line” tool,  , located just below the search bar at the top of the MyMaps screen, allows users to create a rough outline of any shape on the map, and then drag that outline around the world to compare its size. Here’s how it works: After clicking on “Draw a line,” select “Add line or shape” and begin adding points to the map by clicking. Don’t worry about where you’re adding your points just yet, once you’ve created a shape you can move it anywhere you’d like! Once you have three or four points, complete the polygon by clicking back on top of your first point, and you should have a shape that looks something like this:

A block drawn in MyMaps and placed over Illinois

Now it’s time to create a more detailed outline. Click and drag your shape over the area you want to outline, and get to work! You can change the size of your shape by dragging on the points at the corners, and you can add more points by clicking and dragging on the transparent circles located midway between each corner. For this example, I made a rough outline of Greenland, as you can see below.

Area of Greenland made in MyMaps

You can get as detailed as you want with the points on your shapes, depending on how much time you want to spend clicking and dragging points around on your computer screen. Obviously I did not perfectly trace the exact coastline of Greenland, but my finished product is at least recognizable enough. Now for the fun part! Click somewhere inside the boundary of your shape, drag it somewhere else on the map, and see Mercator’s distortion come to life before your eyes.

Area of Greenland placed over Africa

Here you can see the exact same shape as in the previous image, except instead of hovering over Greenland at the north end of the map, it is placed over Africa and the equator. The area of the shape is exactly the same, but the way it is displayed on the map has been adjusted for the relative distortion of the particular position it now occupies on the map. If that hasn’t sufficiently shaken your understanding of our planet, MyMaps has one more tool for illuminating the divide between the map and reality. The “Measure distances and areas” tool, , draws a “straight” line between any two (or more) points on the map. “Straight” is in quotes there because, as we’re about to see, a straight line on the globe (and therefore in reality) doesn’t typically align with straight lines on the map. For example, if I wanted to see the shortest distance between Chicago and Frankfurt, Germany, I could display that with the Measure tool like so:

Distance line, Chicago to Frankfurt, Germany

The curve in this line represents the curvature of the earth, and demonstrates how the actual shortest distance is not the same as a straight line drawn on the map. This principle is made even more clear through using the Measure tool a little farther north.

Distance line, Chicago to Frankfurt, Germany, set over Greenland

The beginning and ending points of this line are roughly directly north of Chicago and Frankfurt, respectively, however we notice two differences between this and the previous measurement right away. First, this is showing a much shorter distance than Chicago to Frankfurt, and second, the curve in the line is much more distinct. Both of these differences arise, once again, from the difficulty of displaying a sphere on a flat surface. Actual distances get shorter the closer you get to the north (or south) ends of the map, which in turn causes all of the distortions we have seen in this post.

How might a better understanding of projection systems improve your own research? What are some other ways in which the Mercator projection (or any other) have deceived us? Explore for yourself and let us know!

Exploring Data Visualization #10

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A collage of images of sticky notes in different configurations from the article "stickies!"

Sticky notes in all different shapes, sizes, and colors provide a perfect medium for project planning.

1. Sometimes when you want to visualize your thinking, digital tools just don’t cut it and you have to go back to cold, hard paper. At the beginning of November, Cole Nussbaumer Knaflic at Storytelling with Data made a #SWDchallenge for readers to use sticky notes to represent their thinking and plan out a data visualization the old fashioned way! The images that resulted from that challenge, seen in the post stickies!, are an office-supply lover’s dream. I’ve taken inspiration from these posts in my own project planning for the past month—here’s a sneak peek of my thoughts for a sign that will be displayed in a library study space:

A piece of paper that reads "Welcome to Room 220" at the top with sticky notes stuck to the page underneath.

2. In a feature from February of this year, the digital branch of German newspaper Die Zeit, ZEIT ONLINE, showed some interesting finds from their database of approximately 450,000 street names used across Germany. They call the project Streetscapes and use them to explore important parts of German history. These street names show the legacy of political division in Germany, as well as noting what the most common names for streets are and what the age of different streets in Berlin are.

A map of Berlin with streets highlighted in different colors based on the age of the street name.

Older street names are clearly concentrated toward the center of Berlin.

3. Google Maps updated their display this year to zoom out to a globe instead of a flat Mercator projection, noting in a tweet on August 2nd that “With 3D Globe Mode…, Greenland’s projection is no longer the size of Africa.” Adapting the shape of countries from a globe to a flat map has always been a challenge and has resulted in some confusion as to how the Earth’s geography actually looks. In the third part of a series of Story Maps about “The World’s Troubled Lands & Geopolitical Curiosities,” John Nelson outlines some of those misconceptions. In a National Geographic write-up titled “Why your mental map of the world is (probably) wrong,” Betsy Mason goes deeper into why we hold these misconceptions and why they are so hard to let go of.

The title slide of a story map with text that reads "Misconceptions Some Common Geographic Mental Misplacements..."

The story map shows which three different regions people often misplace in their minds.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Cool Text Data – Music, Law, and News!

Computational text analysis can be done in virtually any field, from biology to literature. You may use topic modeling to determine which areas are the most heavily researched in your field, or attempt to determine the author of an orphan work. Where can you find text to analyze? So many places! Read on for sources to find unique text content.

Woman with microphone

Genius – the song lyrics database

Genius started as Rap Genius, a site where rap fans could gather to annotate and analyze rap lyrics. It expanded to include other genres in 2014, and now manages a massive database covering Ariana Grande to Fleetwood Mac, and includes both lyrics and fan-submitted annotations. All of this text can be downloaded and analyzed using the Genius API. Using Genius and a text mining method, you could see how themes present in popular music changed over recent years, or understand a particular artist’s creative process.

homepage of case.law, with Ohio highlighted, 147,692 unique cases. 31 reporters. 713,568 pages scanned.

Homepage of case.law

Case.law – the case law database

The Caselaw Access Project (CAP) is a fairly recent project that is still ongoing, and publishes machine-readable text digitized from over 40,000 bound volumes of case law from the Harvard Law School Library. The earliest case is from 1658, with the most recent cases from June 2018. An API and bulk data downloads make it easy to get this text data. What can you do with huge amounts of case law? Well, for starters, you can generate a unique case law limerick:

Wheeler, and Martin McCoy.
Plaintiff moved to Illinois.
A drug represents.
Pretrial events.
Rocky was just the decoy.

Check out the rest of their gallery for more project ideas.

Newspapers and More

There are many places you can get text from digitized newspapers, both recent and historical. Some newspaper are hundreds of years old, so there can be problems with the OCR (Optical Character Recognition) that will make it difficult to get accurate results from your text analysis. Making newspaper text machine readable requires special attention, since they are printed on thin paper and have possibly been stacked up in a dusty closet for 60 years! See OCR considerations here, but the newspaper text described here is already machine-readable and ready for text mining. However, with any text mining project, you must pay close attention to the quality of your text.

The Chronicling America project sponsored by the Library of Congress contains digital copies of newspapers with machine-readable text from all over the United States and its territories, from 1690 to today. Using newspaper text data, you can analyze how topics discussed in newspapers change over time, among other things.

newspapers being printed quickly on a rolling press

Looking for newspapers from a different region? The library has contracts with several vendors to conduct text mining, including Gale and ProQuest. Both provide newspaper text suitable for text mining, from The Daily Mail of London (Gale), to the Chinese Newspapers Collection (ProQuest). The way you access the text data itself will differ between the two vendors, and the library will certainly help you navigate the collections. See the Finding Text Data library guide for more information.

The sources mentioned above are just highlights of our text data collection! The Illinois community has access to a huge amount of text, including newspapers and primary sources, but also research articles and books! Check out the Finding Text Data library guide for a more complete list of sources. And, when you’re ready to start your text mining project, contact the Scholarly Commons (sc@library.illinois.edu), and let us help you get started!

Wikidata and Wikidata Human Gender Indicators (WHGI)

Wikipedia is a central player in online knowledge production and sharing. Since its founding in 2001, Wikipedia has been committed to open access and open editing, which has made it the most popular reference work on the web. Though students are still warned away from using Wikipedia as a source in their scholarship, it presents well-researched information in an accessible and ostensibly democratic way.

Most people know Wikipedia from its high ranking in most internet searches and tend to use it for its encyclopedic value. The Wikimedia Foundation—which runs Wikipedia—has several other projects which seek to provide free access to knowledge. Among those are Wikimedia Commons, which offers free photos; Wikiversity, which offers free educational materials; and Wikidata, which provides structured data to support the other wikis.

The Wikidata logo

Wikidata provides structured data to support Wikimedia and other Wikimedia Foundation projects

Wikidata is a great tool to study how Wikipedia is structured and what information is available through the online encyclopedia. Since it is presented as structured data, it can be analyze quantitatively more easily than Wikipedia articles. This has led to many projects that allow users to explore data through visualizations, queries, and other means. Wikidata offers a page of Tools that can be used to analyze Wikidata more quickly and efficiently, as well as Data Access instructions for how to use data from the site.

The webpage for the Wikidata Human Gender Indicators project

The home page for the Wikidata Human Gender Indicators project

An example of a project born out of Wikidata is the Wikidata Human Gender Indicators (WHGI) project. The project uses metadata from Wikidata entries about people to analyze trends in gender disparity over time and across cultures. The project presents the raw data for download, as well as charts and an article written about the discoveries the researchers made while compiling the data. Some of the visualizations they present are confusing (perhaps they could benefit from reading our Lightning Review of Data Visualization for Success), but they succeed in conveying important trends that reveal a bias toward articles about men, as well as an interesting phenomenon surrounding celebrities. Some regions will have a better ratio of women to men biographies due to many articles being written about actresses and female musicians, which reflects cultural differences surrounding fame and gender.

Of course, like many data sources, Wikidata is not perfect. The creators of the WHGI project frequently discovered that articles did not have complete metadata related to gender or nationality, which greatly influenced their ability to analyze the trends present on Wikipedia related to those areas. Since Wikipedia and Wikidata are open to editing by anyone and are governed by practices that the community has agreed upon, it is important for Wikipedians to consider including more metadata in their articles so that researchers can use that data in new and exciting ways.

An animated gif of the Wikipedia logo bouncing like a ball