Exploring Data Visualization #12

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

American segregation, mapped day and night

Is segregation in the United States improving? And if it is, what race sees the most people of different races? And do the answers to these questions change based on the time of day? Vox sets out to answer some of these questions through a video essay and an interactive map about segregation in the United States cities at work and at home.

A map of Champaign County showing data peaks where the highest population of Black people live.

This map shows the population density of Black people living in Champaign-Urbana, IL. The brighter the pink, the higher the percentage of Black people living only near Black people.

A map showing the areas in Champaign County populated by white people.

This map shows the population density of white people living in Champaign-Urbana, IL. The brighter the pink, the higher the percentage of white people living only near white people.

The map is interesting and effectively demonstrates the continued presence of segregation in communities across the United States. However, there is little detail on the map about the geographical features of the region being examined. This isn’t too much of a problem if you are familiar with the region you are looking at, but for more unfamiliar communities it leads to more questions than it answers.

NASA’s Opportunity Rover Dies on Mars


After 15 years on Mars, the Opportunity Rover Mission was officially declared finished on February 13th, 2019. The New York Times created a visualization that lets you follow Opportunity’s 28 mile path across the surface of Mars, which includes a bird’s eye view of Oppy’s path as well as images sent by the rover back to NASA. Opportunity was responsible for discovering evidence of drinkable water on Mars.

A map of the surface of mars with a yellow line showing the path of NASA's Opportunity rover. There is a small image in the corner of Santa Maria Crater taken by the rover.

The map of Opportunity’s path is accompanied by images from the rover and artists’ renderings of the surface of Mars.

The periodic table is a scatterplot. (Among others.)


The periodic table: a data visualization familiar to anyone who has ever set foot in a grade school science classroom. As Lisa Rost points out, the periodic table is actually just a simple scatter plot, with group as the x-axis and period as the y-axis. Or at least, that’s true of the Mendeleev periodic table, the one we are most familiar with. See some other examples of how to break down the periodic table on Rost’s post, which links to the Wikipedia article on alternative periodic tables. If you find a favorite, be sure to tweet it to us @ScholCommons! We are always curious to see what visualizations get people excited.

A visualization of the periodic table of the elements with the elements represented by different colored dots. The dot colors correspond to when in time the elements were discovered, which is coded in a key at the top of the chart. Yellow is before Mendeleev, blue is after Mendeleev, orange is BC, and black is since 2000.

A periodic table color coded by Lisa Rost to show when in time different elements where discovered.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Transformation in Digital Humanities

The opinions presented in this piece are solely the author’s and referenced authors. This is meant to serve as a synthesis of arguments made in DH regarding transformation.

How do data and algorithms affect our lives? How does technology affect our humanity? Scholars and researchers in the digital humanities (DH) ask questions about how we can use DH to enact social change by making observations of the world around us. This kind of work is often called “transformative DH.”

The idea of transformative DH is an ongoing conversation. As Moya Bailey wrote in 2011, scholars’ experiences and identities affect and inform their theories and practices, which allows them to make worthwhile observations in diverse areas of humanities scholarship. Just as there is strong conflict about how DH itself is defined, there is also conflict regarding whether or not DH needs to be “transformed.” The theme of the 2011 Annual DH Conference held at Stanford was “Big Tent Digital Humanities,” a phrase symbolizing the welcoming nature of the DH field as a space for interdisciplinary scholarship. Still, those on the fringes found themselves unwelcome, or at least unacknowledged.

This conversation around what DH is and what it could be exploded at the Modern Languages Association (MLA) Convention in 2011, which featured multiple digital humanities and digital pedagogy sessions aimed at defining the field and what “counts” as DH. During the convention Stephen Ramsay, in a talk boldly title “Who’s In and Who’s Out,” stated that all digital humanists must code in order to be considered a digital humanist (he later softened “code” to “build”). These comments resulted in ongoing conversations online about gatekeeping in DH, which refer to both what work counts as DH and who counts as a DHer or digital humanist. Moya Bailey also noted certain that scholars whose work focused on race, gender, or queerness and relationships with technology were “doing intersectional digital humanities work in all but name.” This work, however, was not acknowledged as digital humanities.


Website Banner from transformdh.org

To address gatekeeping in the DH community more fully, the group #transformDH was formed in 2011, during this intense period of conversation and attempts at defining. The group self-describes as an “academic guerrilla movement” aimed at re-defining DH as a tool for transformative, social justice scholarship. Their primary objective is to create space in the DH world for projects that push beyond traditional humanities research with digital tools. To achieve this, they encourage and create projects that have the ability to enact social change and bring conversations on race, gender, sexuality, and class into both the academy and the public consciousness. An excellent example of this ideology is the Torn Apart/Separados project, a rapid response DH project completed in response to the United States enacting a “Zero Tolerance Policy” for immigrants attempting to cross the US/Mexico border. In order to visualize the reach and resources of ICE (those enforcing this policy), a cohort of scholars, programmers, and data scientists banded together and published this project in a matter of weeks. Projects such as these demonstrate the potential of DH as a tool for transformative scholarship and to enact social change. The potential becomes dangerously disregarded when we set limits on who counts as a digital humanist and what counts as digital humanities work.

For further, in-depth reading on this topic, check out the articles below.

February Push!

Hello, researchers!

Congratulations! You made it through your first month back of the spring semester. From class work, to pouring rain, to enough snow and ice and make the university look like it’s auditioning for a role as Antarctica, you’re pushing forward!

A dual-monitor computer in the Scholarly Commons. The background of the image shows the Scholarly Commons space, which is filled with out dual-monitor computers and various desks.

Take a minute to look over all the awesome resources we have, right here in the Scholarly Commons, to help you keep chugging along with your research.

We are open 8:30 a.m. to 6 p.m., Monday through Friday. Our various, dual monitor computers have software ranging from Adobe Photoshop to OCR which can be paired with our various scanners to make machine readable PDFs!

The Scholarly Commons space. A desk with a computer and a sign reading "Scholarly Commons" is shown.

Researchers can book free consultations thanks to our partnerships with CITL Data Analytics and Technology Services! In these meetings, you can learn about R, SAS, and everything else you need to just get started or to get past that tricky problem in your statistical research.

Beyond that, users can make appoints with our GIS specialist, and learn even more through our GIS resources. We have a ton of great books in our non-circulating reference collection that can help you learn about Python, GIS, and more!

The Scholarly Common reference collection. Six shelves filled with books.


And that’s not all: our Data Analytics & Visualization Librarian has put together a plethora of resources to help turn your data into art. Check out the four most common types of charts guide to get started!

The Scholarly Commons space. it contains several workstations with a carpeted floor.

And even this doesn’t cover all of our services!

If you need assistance finding numeric data, understanding your copyrights, cleaning up data in OpenRefine, or even starting up a project using text mining, we have the resources you need.

The Scholarly Commons has all the resources you need to succeed, so stop by anytime! We’re always happy to help.

Exploring Data Visualization #11

 In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I'll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Data Visualization Office Hours and Workshops

A headshot of Megan Ozeran with a border above her reading Data Viz Help and a banner below that reads The Librarian is In

Our amazing Data Visualization Librarian Megan Ozeran is holding open office hours every other Monday for the Spring 2019 semester! Drop by the Scholarly Commons from 2-4 on any of the dates listed below to ask any data viz questions you might have.

Office hours on: February 25, March 11, March 25, April 8, April 22, and May 6.

Additionally, Megan will teach a joint workshop as part of our Savvy Researcher series titled “Network Analysis in Digital Humanities” on Thursday, March 7th. Megan and SC GA Kayla Abner will cover the basics of how to use NodeXL, Palladio, and Cytoscape to show relationships between concepts in your research. Register online on our Savvy Researcher Calendar!

Lifespan of News Stories

A chart showing the search interest for different news stories in October 2018, represented as colored peaks with the apex labeled with a world event.

October was one of the busier times of the year, with eight overlapping news stories. Hurricane Michael tied with Hurricane Florence for the largest number of searches in 2018.

According to trends compiled by the news site Axios, “news cycles for some of the biggest moments of 2018 only lasted for a median of 7 days.” Axios put together a timeline of the year which shows the peaks and valleys of 49 of the top news stories from 2018. A simplified view of the year in the article “What captured America’s attention in 2018” shows the distribution of those 49 stories, while a full site, “The Lifespan of News Stories,” shows search interest by region and links to an article from Axios about the event (clever advertising on their part).

#SWDchallenge: visualize variance

A graph showing the average minimum temperature in Milwaukee, Wisconsin, for January 2000 through January 2019. The points on the chart are connected with light blue lines and filled in with blue to resemble icicles.

Knaflic’s icicle-style design for minimum temperature.

If there were to be a search interest visualization for the past few weeks in the Midwest, I have no doubt that the highest peak would be for the term “polar vortex.” The weather so far this year has been unusual, thanks to the extreme cold due to the polar vortex we had in the last week of January. Cole Nussbaumer Knaflic from Storytelling with Data used the cold snap as inspiration for the #SWDchallenge this month: visualize variance. Knaflic went through a series of visualizations in a blog post to show variation in average temperature in Milwaukee.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Google MyMaps Part II: The Problem with Projections

Back in October, we published a blog post introducing you to Google MyMaps, an easy way to display simple information in map form. Today we’re going to revisit that topic and explore some further ways in which MyMaps can help you visualize different kinds of data!

One of the most basic things that students of geography learn is the problem of projections: the earth is a sphere, and there is no perfect way to translate an image from the surface of a sphere to a flat plane. Nevertheless, cartographers over the years have come up with many projection systems which attempt to do just that, with varying degrees of success. Google Maps (and, by extension, Google MyMaps) uses perhaps the most common of these, the Mercator projectionDespite its ubiquity, the Mercator projection has been criticized for not keeping area uniform across the map. This means that shapes far away from the equator appear to be disproportionately larger in comparison with shapes on the equator.

Luckily, MyMaps provides a method of pulling up the curtain on Mercator’s distortion. The “Draw a line” tool,  , located just below the search bar at the top of the MyMaps screen, allows users to create a rough outline of any shape on the map, and then drag that outline around the world to compare its size. Here’s how it works: After clicking on “Draw a line,” select “Add line or shape” and begin adding points to the map by clicking. Don’t worry about where you’re adding your points just yet, once you’ve created a shape you can move it anywhere you’d like! Once you have three or four points, complete the polygon by clicking back on top of your first point, and you should have a shape that looks something like this:

A block drawn in MyMaps and placed over Illinois

Now it’s time to create a more detailed outline. Click and drag your shape over the area you want to outline, and get to work! You can change the size of your shape by dragging on the points at the corners, and you can add more points by clicking and dragging on the transparent circles located midway between each corner. For this example, I made a rough outline of Greenland, as you can see below.

Area of Greenland made in MyMaps

You can get as detailed as you want with the points on your shapes, depending on how much time you want to spend clicking and dragging points around on your computer screen. Obviously I did not perfectly trace the exact coastline of Greenland, but my finished product is at least recognizable enough. Now for the fun part! Click somewhere inside the boundary of your shape, drag it somewhere else on the map, and see Mercator’s distortion come to life before your eyes.

Area of Greenland placed over Africa

Here you can see the exact same shape as in the previous image, except instead of hovering over Greenland at the north end of the map, it is placed over Africa and the equator. The area of the shape is exactly the same, but the way it is displayed on the map has been adjusted for the relative distortion of the particular position it now occupies on the map. If that hasn’t sufficiently shaken your understanding of our planet, MyMaps has one more tool for illuminating the divide between the map and reality. The “Measure distances and areas” tool, , draws a “straight” line between any two (or more) points on the map. “Straight” is in quotes there because, as we’re about to see, a straight line on the globe (and therefore in reality) doesn’t typically align with straight lines on the map. For example, if I wanted to see the shortest distance between Chicago and Frankfurt, Germany, I could display that with the Measure tool like so:

Distance line, Chicago to Frankfurt, Germany

The curve in this line represents the curvature of the earth, and demonstrates how the actual shortest distance is not the same as a straight line drawn on the map. This principle is made even more clear through using the Measure tool a little farther north.

Distance line, Chicago to Frankfurt, Germany, set over Greenland

The beginning and ending points of this line are roughly directly north of Chicago and Frankfurt, respectively, however we notice two differences between this and the previous measurement right away. First, this is showing a much shorter distance than Chicago to Frankfurt, and second, the curve in the line is much more distinct. Both of these differences arise, once again, from the difficulty of displaying a sphere on a flat surface. Actual distances get shorter the closer you get to the north (or south) ends of the map, which in turn causes all of the distortions we have seen in this post.

How might a better understanding of projection systems improve your own research? What are some other ways in which the Mercator projection (or any other) have deceived us? Explore for yourself and let us know!

Google Scholar: Friend or Foe?

This is a guest blog by the amazing Zachary Maiorana, a GA in Scholarly and Communication Publishing

Homepage for Google Scholar

Homepage for Google Scholar

Scholars and users have a vested interest in understanding the relative authority of publications they have either written or wish to cite to form the basis of their research. Although the literature search, a common topic in library instruction and research seminars, can take place on a huge variety of discovery tools, researchers often rely on Google Scholar as a supporting or central platform.

The massive popularity of Google Scholar is likely due to its simple interface, which bears the longtime prestige of Google’s search engine; its enormous breadth, with a simple search yielding millions of results; its compatibility and parallels with other Googles Chrome and Books; and its citation metrics mechanism.

This last aspect of Google Scholar, which collects and reports data on the number of citations a given publication receives, represents the platform’s apparent ability to precisely calculate the research community’s interest in that publication. But, in the University Library’s work on the Illinois Experts (experts.illinois.edu) research and scholarship portal, we have encountered a number of circumstances in which Google Scholar has misrepresented U of I faculty members’ research.

Recent studies reveal that Google Scholar, despite its popularity and its massive reach, is not only often inaccurate in its reporting of citation metrics and title attribution, but also susceptible to deliberate manipulation. In 2010, Labbé discusses an experiment using Ike Antkare (AKA “I can’t care”), a fictitious researcher whose bibliography was manufactured with a mountain of self-referencing citations. After the purposely falsified publications went public, Google’s bots didn’t differentiate Antkare’s research from his real-life peers during their crawling of his 100 generated articles. As a result, Google Scholar reported Antkare as one of the most cited researchers in the world, with a higher H-index* than Einstein.

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

In 2014, Spanish researchers conducted an experiment in which they created a fake scholar with several papers making hundreds of references to works written by the experimenters. After the papers were made public on a personal site, Google Scholar scraped the data and the real-life researchers’ profiles increased by 774 citations in total. In the hands of more nefarious users seeking to aggrandize their own careers or alter scientific opinion, such practices could result in large-scale academic fraud.

For libraries, Google’s kitchen-sink-included data collection methods further result in confusing and inaccurate attributions. In our work to supplement the automated collection of publication data for faculty profiles on Illinois Experts using CVs, publishers’ sites, journal sites, databases, and Google Scholar, we frequently encounter researchers’ names and works mischaracterized by Google’s clumsy aggregation mechanisms. For example, Google Scholar’s bots often read a scholar’s name somewhere within a work that the scholar hasn’t written—perhaps they were mentioned in the acknowledgements or in a citation—and simply attribute the work to them as author.

When it comes to people’s careers and the sway of scientific opinion, such snowballing mistakes can be a recipe for large-scale misdirection. Though much research exists that shows that, in general, Google Scholar currently represents highly cited research well, weaknesses persist. Blind distrust of any dominant proprietary platform is unwise, and using Google Scholar requires particularly careful judgment.

Read more on Google Scholar’s quality and reliability:

Brown, Christopher C. 2017. “Google Scholar.” The Charleston Advisor 19 (2): 31–34. https://doi.org/10.5260/chara.19.2.31.

Halevi, Gali, Henk Moed, and Judit Bar-Ilan. 2017. “Suitability of Google Scholar as a Source of Scientific Information and as a Source of Data for Scientific Evaluation—Review of the Literature.” Journal of Informetrics 11 (3): 823–34. https://doi.org/10.1016/j.joi.2017.06.005.

Labbé, Cyril. 2016. “L’histoire d’Ike Antkare et de Ses Amis Fouille de Textes et Systèmes d’information Scientifique.” Document Numérique 19 (1): 9–37. https://doi.org/10.3166/dn.19.1.9-37.

Lopez-Cozar, Emilio Delgado, Nicolas Robinson-Garcia, and Daniel Torres-Salinas. 2012. “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting.” ArXiv:1212.0638 [Cs], December. http://arxiv.org/abs/1212.0638.

Walker, Lizzy A., and Michelle Armstrong. 2014. “‘I Cannot Tell What the Dickens His Name Is’: Name Disambiguation in Institutional Repositories.” Journal of Librarianship and Scholarly Communication 2 (2). https://doi.org/10.7710/2162-3309.1095.

*Read the library’s LibGuide on bibliometrics for an explanation of the h-index and other standard research metrics: https://guides.library.illinois.edu/c.php?g=621441&p=4328607

How We’re Celebrating the Sweet Public Domain

This is a guest blog by the amazing Kaylen Dwyer, a GA in Scholarly and Communication Publishing

Collage of the Honey Bunch series

As William Tringali mentioned last week, 2019 marks an exciting shift in copyright law with hundreds of thousands of works entering the public domain every January 1st for the next eighteen years. We are setting our clocks back to the year of 1923—to the birth of the Harlem Renaissance with magazines like The Crisis, to first-wave feminists like Edith Wharton, Virginia Woolf, and Dorothy L. Sayers, back to the inter-war period.

Copyright librarian Sara Benson has been laying the groundwork to bring in the New Year and celebrate the wealth of knowledge now publicly available for quite some time, leading up to a digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, and the Re-Mix It! Competition to be held this spring.

A collaborative effort between Benson, graduate assistants, and several scholarly contributors, The Sweet Public Domain celebrates creative reuse and copyright law. Last year, GA Paige Kuester spent time scouring the Rare Book and Manuscript Library in search of something that had never been digitized before, something at risk of being forgotten forever, not because it is unworthy of attention, but because it has been captive to copyright for so long.

We found just the thing—the beloved Honey Bunch series, a best-selling girls’ series by the Stratemeyer Syndicate. The syndicate become known for its publication of Nancy Drew, the Hardy Boys, the Bobbsey Twins, and many others, but in 1923 they kicked off the adventures of Honey Bunch with Just a Little Girl, Her First Visit to the City, and Her First Days on the Farm.

Through the digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, you can explore all three books, introduced by Deidre Johnson (Edward Stratemeyer and the Stratemeyer Syndicate, 1993) and LuElla D’Amico (Girls Series Fiction and American Popular Culture, 2017). To hear more about copyright and creative reuse, you can find essays by Sara Benson, our copyright librarian, and Kirby Ferguson, filmmaker and producer of Everything is a Remix.

If you are a student at the University of Illinois at Urbana-Champaign, you can engage with the public domain by making new and innovative work out of something old and win up to $500 for your creation. Check out the Re-Mix It! Competition page for contest details and be sure to check out our physical exhibit in the Marshall Gallery (Main Library, first floor east entrance) for ideas.

Logo for the Remix It competition

A Beautiful Year for Copyright!

Hello, researchers! And welcome to the bright, bold world of 2019! All around the United States, Copyright Librarians are rejoicing this amazing year! But why, might you ask?

Cover page of "Leaves From A Grass House" from Don Landing

Cover page of “Leaves From A Grass House” from Don Landing

Well, after 20 years, formally published works are entering the public domain. That’s right, the amazing, creative works of 1923 will belong to the public as a whole.

Though fascinating works like Virginia Woolf’s Jacob’s Room are just entering the public domain Some works entered the public domain years ago. The holiday classic “It’s a Wonderful Life”, entered the public domain because, according to Duke Law School’s Center for the Study of the Public Domain (2019), its copyright was not renewed after its “first 28 year term” (Paragraph 13). Though, in a fascinating turn of events, the original copyright holder “reasserted copyright based on its ownership of the film’s musical score and the short story on which the film was based” after the film became such a success. (Duke Law School’s Center for the Study of the Public Domain, 2019, Paragraph 13).

An image of a portion of Robert Frost's poem "New Hampshire"

An image of a portion of Robert Frost’s poem “New Hampshire”

But again, why all the fuss? Don’t items enter the public domain ever year?

That answer is, shockingly, no! Though 1922 classics like Nosferatu entered the public domain in 1998, 1923’s crop of public domain works are only entering this year, making this the first time in 20 years a massive crop of works have become public, according to Verge writer Jon Porter (2018). This was the year lawmakers “extended the length of copyright from 75 years to 95, or from 50 to 70 years after the author’s death” (Porter, 2018, Paragraph 2).

Table of contents for "Tarzan and the Golden Lion"

Table of contents for “Tarzan and the Golden Lion”

What’s most tragic about this long wait time for the release of these works is that, after almost 100 years, so many of them are lost. Film has decayed, text has vanished, and music has stopped being played. We cannot know the amount of creative works lost to time, but here are a few places that can help you find public domain works from 1923!

Duke Law School’s Center for the Study of the Public Domain has an awesome blog post with even more information about copyright law and the works now available to the public.

If you want to know what’s included in this mass public domain-ifying of so many amazing creative works book-wise, you can check out HathiTrust has released more than 53,000 readable online, for free!

Screenshot of the HathiTrust search page for items published in the year 1923.

Screenshot of the HathiTrust search page for items published in the year 1923.

Finally, the Public Domain Review has a great list of links to works now available!


Duke Law School’s Center for the Study of the Public Domain. (2019, Jan. 1). Public Domain Day 2019. Retrieved from https://law.duke.edu/cspd/publicdomainday/2019/

Porter, Jon. (2018, December 31). After a 20 year delay, works from 1923 will finally enter the public domain tomorrow. The Verge. Retrieved from https://www.theverge.com/2018/12/31/18162933/public-domain-day-2019-the-pilgrim-jacobs-room-charleston-copyright-expiration

Exploring Data Visualization #10

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I'll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A collage of images of sticky notes in different configurations from the article "stickies!"

Sticky notes in all different shapes, sizes, and colors provide a perfect medium for project planning.

1. Sometimes when you want to visualize your thinking, digital tools just don’t cut it and you have to go back to cold, hard paper. At the beginning of November, Cole Nussbaumer Knaflic at Storytelling with Data made a #SWDchallenge for readers to use sticky notes to represent their thinking and plan out a data visualization the old fashioned way! The images that resulted from that challenge, seen in the post stickies!, are an office-supply lover’s dream. I’ve taken inspiration from these posts in my own project planning for the past month—here’s a sneak peek of my thoughts for a sign that will be displayed in a library study space:

A piece of paper that reads "Welcome to Room 220" at the top with sticky notes stuck to the page underneath.

2. In a feature from February of this year, the digital branch of German newspaper Die Zeit, ZEIT ONLINE, showed some interesting finds from their database of approximately 450,000 street names used across Germany. They call the project Streetscapes and use them to explore important parts of German history. These street names show the legacy of political division in Germany, as well as noting what the most common names for streets are and what the age of different streets in Berlin are.

A map of Berlin with streets highlighted in different colors based on the age of the street name.

Older street names are clearly concentrated toward the center of Berlin.

3. Google Maps updated their display this year to zoom out to a globe instead of a flat Mercator projection, noting in a tweet on August 2nd that “With 3D Globe Mode…, Greenland’s projection is no longer the size of Africa.” Adapting the shape of countries from a globe to a flat map has always been a challenge and has resulted in some confusion as to how the Earth’s geography actually looks. In the third part of a series of Story Maps about “The World’s Troubled Lands & Geopolitical Curiosities,” John Nelson outlines some of those misconceptions. In a National Geographic write-up titled “Why your mental map of the world is (probably) wrong,” Betsy Mason goes deeper into why we hold these misconceptions and why they are so hard to let go of.

The title slide of a story map with text that reads "Misconceptions Some Common Geographic Mental Misplacements..."

The story map shows which three different regions people often misplace in their minds.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Cool Text Data – Music, Law, and News!

Computational text analysis can be done in virtually any field, from biology to literature. You may use topic modeling to determine which areas are the most heavily researched in your field, or attempt to determine the author of an orphan work. Where can you find text to analyze? So many places! Read on for sources to find unique text content.

Woman with microphone

Genius – the song lyrics database

Genius started as Rap Genius, a site where rap fans could gather to annotate and analyze rap lyrics. It expanded to include other genres in 2014, and now manages a massive database covering Ariana Grande to Fleetwood Mac, and includes both lyrics and fan-submitted annotations. All of this text can be downloaded and analyzed using the Genius API. Using Genius and a text mining method, you could see how themes present in popular music changed over recent years, or understand a particular artist’s creative process.

homepage of case.law, with Ohio highlighted, 147,692 unique cases. 31 reporters. 713,568 pages scanned.

Homepage of case.law

Case.law – the case law database

The Caselaw Access Project (CAP) is a fairly recent project that is still ongoing, and publishes machine-readable text digitized from over 40,000 bound volumes of case law from the Harvard Law School Library. The earliest case is from 1658, with the most recent cases from June 2018. An API and bulk data downloads make it easy to get this text data. What can you do with huge amounts of case law? Well, for starters, you can generate a unique case law limerick:

Wheeler, and Martin McCoy.
Plaintiff moved to Illinois.
A drug represents.
Pretrial events.
Rocky was just the decoy.

Check out the rest of their gallery for more project ideas.

Newspapers and More

There are many places you can get text from digitized newspapers, both recent and historical. Some newspaper are hundreds of years old, so there can be problems with the OCR (Optical Character Recognition) that will make it difficult to get accurate results from your text analysis. Making newspaper text machine readable requires special attention, since they are printed on thin paper and have possibly been stacked up in a dusty closet for 60 years! See OCR considerations here, but the newspaper text described here is already machine-readable and ready for text mining. However, with any text mining project, you must pay close attention to the quality of your text.

The Chronicling America project sponsored by the Library of Congress contains digital copies of newspapers with machine-readable text from all over the United States and its territories, from 1690 to today. Using newspaper text data, you can analyze how topics discussed in newspapers change over time, among other things.

newspapers being printed quickly on a rolling press

Looking for newspapers from a different region? The library has contracts with several vendors to conduct text mining, including Gale and ProQuest. Both provide newspaper text suitable for text mining, from The Daily Mail of London (Gale), to the Chinese Newspapers Collection (ProQuest). The way you access the text data itself will differ between the two vendors, and the library will certainly help you navigate the collections. See the Finding Text Data library guide for more information.

The sources mentioned above are just highlights of our text data collection! The Illinois community has access to a huge amount of text, including newspapers and primary sources, but also research articles and books! Check out the Finding Text Data library guide for a more complete list of sources. And, when you’re ready to start your text mining project, contact the Scholarly Commons (sc@library.illinois.edu), and let us help you get started!

