Google Scholar: Friend or Foe?

Homepage for Google Scholar

Homepage for Google Scholar

Scholars and users have a vested interest in understanding the relative authority of publications they have either written or wish to cite to form the basis of their research. Although the literature search, a common topic in library instruction and research seminars, can take place on a huge variety of discovery tools, researchers often rely on Google Scholar as a supporting or central platform.

The massive popularity of Google Scholar is likely due to its simple interface, which bears the longtime prestige of Google’s search engine; its enormous breadth, with a simple search yielding millions of results; its compatibility and parallels with other Googles Chrome and Books; and its citation metrics mechanism.

This last aspect of Google Scholar, which collects and reports data on the number of citations a given publication receives, represents the platform’s apparent ability to precisely calculate the research community’s interest in that publication. But, in the University Library’s work on the Illinois Experts (experts.illinois.edu) research and scholarship portal, we have encountered a number of circumstances in which Google Scholar has misrepresented U of I faculty members’ research.

Recent studies reveal that Google Scholar, despite its popularity and its massive reach, is not only often inaccurate in its reporting of citation metrics and title attribution, but also susceptible to deliberate manipulation. In 2010, Labbé discusses an experiment using Ike Antkare (AKA “I can’t care”), a fictitious researcher whose bibliography was manufactured with a mountain of self-referencing citations. After the purposely falsified publications went public, Google’s bots didn’t differentiate Antkare’s research from his real-life peers during their crawling of his 100 generated articles. As a result, Google Scholar reported Antkare as one of the most cited researchers in the world, with a higher H-index* than Einstein.

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

In 2014, Spanish researchers conducted an experiment in which they created a fake scholar with several papers making hundreds of references to works written by the experimenters. After the papers were made public on a personal site, Google Scholar scraped the data and the real-life researchers’ profiles increased by 774 citations in total. In the hands of more nefarious users seeking to aggrandize their own careers or alter scientific opinion, such practices could result in large-scale academic fraud.

For libraries, Google’s kitchen-sink-included data collection methods further result in confusing and inaccurate attributions. In our work to supplement the automated collection of publication data for faculty profiles on Illinois Experts using CVs, publishers’ sites, journal sites, databases, and Google Scholar, we frequently encounter researchers’ names and works mischaracterized by Google’s clumsy aggregation mechanisms. For example, Google Scholar’s bots often read a scholar’s name somewhere within a work that the scholar hasn’t written—perhaps they were mentioned in the acknowledgements or in a citation—and simply attribute the work to them as author.

When it comes to people’s careers and the sway of scientific opinion, such snowballing mistakes can be a recipe for large-scale misdirection. Though much research exists that shows that, in general, Google Scholar currently represents highly cited research well, weaknesses persist. Blind distrust of any dominant proprietary platform is unwise, and using Google Scholar requires particularly careful judgment.

Read more on Google Scholar’s quality and reliability:

Brown, Christopher C. 2017. “Google Scholar.” The Charleston Advisor 19 (2): 31–34. https://doi.org/10.5260/chara.19.2.31.

Halevi, Gali, Henk Moed, and Judit Bar-Ilan. 2017. “Suitability of Google Scholar as a Source of Scientific Information and as a Source of Data for Scientific Evaluation—Review of the Literature.” Journal of Informetrics 11 (3): 823–34. https://doi.org/10.1016/j.joi.2017.06.005.

Labbé, Cyril. 2016. “L’histoire d’Ike Antkare et de Ses Amis Fouille de Textes et Systèmes d’information Scientifique.” Document Numérique 19 (1): 9–37. https://doi.org/10.3166/dn.19.1.9-37.

Lopez-Cozar, Emilio Delgado, Nicolas Robinson-Garcia, and Daniel Torres-Salinas. 2012. “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting.” ArXiv:1212.0638 [Cs], December. http://arxiv.org/abs/1212.0638.

Walker, Lizzy A., and Michelle Armstrong. 2014. “‘I Cannot Tell What the Dickens His Name Is’: Name Disambiguation in Institutional Repositories.” Journal of Librarianship and Scholarly Communication 2 (2). https://doi.org/10.7710/2162-3309.1095.

*Read the library’s LibGuide on bibliometrics for an explanation of the h-index and other standard research metrics: https://guides.library.illinois.edu/c.php?g=621441&p=4328607

Facebook Twitter Delicious Email

How We’re Celebrating the Sweet Public Domain

This is a guest blog by the amazing Kaylen Dwyer, a GA in Scholarly and Communication Publishing

Collage of the Honey Bunch series

As William Tringali mentioned last week, 2019 marks an exciting shift in copyright law with hundreds of thousands of works entering the public domain every January 1st for the next eighteen years. We are setting our clocks back to the year of 1923—to the birth of the Harlem Renaissance with magazines like The Crisis, to first-wave feminists like Edith Wharton, Virginia Woolf, and Dorothy L. Sayers, back to the inter-war period.

Copyright librarian Sara Benson has been laying the groundwork to bring in the New Year and celebrate the wealth of knowledge now publicly available for quite some time, leading up to a digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, and the Re-Mix It! Competition to be held this spring.

A collaborative effort between Benson, graduate assistants, and several scholarly contributors, The Sweet Public Domain celebrates creative reuse and copyright law. Last year, GA Paige Kuester spent time scouring the Rare Book and Manuscript Library in search of something that had never been digitized before, something at risk of being forgotten forever, not because it is unworthy of attention, but because it has been captive to copyright for so long.

We found just the thing—the beloved Honey Bunch series, a best-selling girls’ series by the Stratemeyer Syndicate. The syndicate become known for its publication of Nancy Drew, the Hardy Boys, the Bobbsey Twins, and many others, but in 1923 they kicked off the adventures of Honey Bunch with Just a Little Girl, Her First Visit to the City, and Her First Days on the Farm.

Through the digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, you can explore all three books, introduced by Deidre Johnson (Edward Stratemeyer and the Stratemeyer Syndicate, 1993) and LuElla D’Amico (Girls Series Fiction and American Popular Culture, 2017). To hear more about copyright and creative reuse, you can find essays by Sara Benson, our copyright librarian, and Kirby Ferguson, filmmaker and producer of Everything is a Remix.

If you are a student at the University of Illinois at Urbana-Champaign, you can engage with the public domain by making new and innovative work out of something old and win up to $500 for your creation. Check out the Re-Mix It! Competition page for contest details and be sure to check out our physical exhibit in the Marshall Gallery (Main Library, first floor east entrance) for ideas.

Logo for the Remix It competition

Facebook Twitter Delicious Email

A Beautiful Year for Copyright!

Hello, researchers! And welcome to the bright, bold world of 2019! All around the United States, Copyright Librarians are rejoicing this amazing year! But why, might you ask?

Cover page of "Leaves From A Grass House" from Don Landing

Cover page of “Leaves From A Grass House” from Don Landing

Well, after 20 years, formally published works are entering the public domain. That’s right, the amazing, creative works of 1923 will belong to the public as a whole.

Though fascinating works like Virginia Woolf’s Jacob’s Room are just entering the public domain Some works entered the public domain years ago. The holiday classic “It’s a Wonderful Life”, entered the public domain because, according to Duke Law School’s Center for the Study of the Public Domain (2019), its copyright was not renewed after its “first 28 year term” (Paragraph 13). Though, in a fascinating turn of events, the original copyright holder “reasserted copyright based on its ownership of the film’s musical score and the short story on which the film was based” after the film became such a success. (Duke Law School’s Center for the Study of the Public Domain, 2019, Paragraph 13).

An image of a portion of Robert Frost's poem "New Hampshire"

An image of a portion of Robert Frost’s poem “New Hampshire”

But again, why all the fuss? Don’t items enter the public domain ever year?

That answer is, shockingly, no! Though 1922 classics like Nosferatu entered the public domain in 1998, 1923’s crop of public domain works are only entering this year, making this the first time in 20 years a massive crop of works have become public, according to Verge writer Jon Porter (2018). This was the year lawmakers “extended the length of copyright from 75 years to 95, or from 50 to 70 years after the author’s death” (Porter, 2018, Paragraph 2).

Table of contents for "Tarzan and the Golden Lion"

Table of contents for “Tarzan and the Golden Lion”

What’s most tragic about this long wait time for the release of these works is that, after almost 100 years, so many of them are lost. Film has decayed, text has vanished, and music has stopped being played. We cannot know the amount of creative works lost to time, but here are a few places that can help you find public domain works from 1923!

Duke Law School’s Center for the Study of the Public Domain has an awesome blog post with even more information about copyright law and the works now available to the public.

If you want to know what’s included in this mass public domain-ifying of so many amazing creative works book-wise, you can check out HathiTrust has released more than 53,000 readable online, for free!

Screenshot of the HathiTrust search page for items published in the year 1923.

Screenshot of the HathiTrust search page for items published in the year 1923.

Finally, the Public Domain Review has a great list of links to works now available!

Sources:

Duke Law School’s Center for the Study of the Public Domain. (2019, Jan. 1). Public Domain Day 2019. Retrieved from https://law.duke.edu/cspd/publicdomainday/2019/

Porter, Jon. (2018, December 31). After a 20 year delay, works from 1923 will finally enter the public domain tomorrow. The Verge. Retrieved from https://www.theverge.com/2018/12/31/18162933/public-domain-day-2019-the-pilgrim-jacobs-room-charleston-copyright-expiration

Facebook Twitter Delicious Email

Exploring Data Visualization #10

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A collage of images of sticky notes in different configurations from the article "stickies!"

Sticky notes in all different shapes, sizes, and colors provide a perfect medium for project planning.

1. Sometimes when you want to visualize your thinking, digital tools just don’t cut it and you have to go back to cold, hard paper. At the beginning of November, Cole Nussbaumer Knaflic at Storytelling with Data make a #SWDchallenge for readers to use sticky notes to represent their thinking and plan out a data visualization the old fashioned way! The images that resulted from that challenge, seen in the post stickies!, are an office-supply lover’s dream. I’ve taken inspiration from these posts in my own project planning for the past month—here’s a sneak peek of my thoughts for a sign that will be displayed in a library study space:

A piece of paper that reads "Welcome to Room 220" at the top with sticky notes stuck to the page underneath.

2. In a feature from February of this year, the digital branch of German newspaper Die Zeit, ZEIT ONLINE, showed some interesting finds from their database of approximately 450,000 street names used across Germany. They call the project Streetscapes and use them to explore important parts of German history. These street names show the legacy of political division in Germany, as well as noting what the most common names for streets are and what the age of different streets in Berlin are.

A map of Berlin with streets highlighted in different colors based on the age of the street name.

Older street names are clearly concentrated toward the center of Berlin.

3. Google Maps updated their display this year to zoom out to a globe instead of a flat Mercator projection, noting in a tweet on August 2nd that “With 3D Globe Mode…, Greenland’s projection is no longer the size of Africa.” Adapting the shape of countries from a globe to a flat map has always been a challenge and has resulted in some confusion as to how the Earth’s geography actually looks. In the third part of a series of Story Maps about “The World’s Troubled Lands & Geopolitical Curiosities,” John Nelson outlines some of those misconceptions. In a National Geographic write-up titled “Why your mental map of the world is (probably) wrong,” Betsy Mason goes deeper into why we hold these misconceptions and why they are so hard to let go of.

The title slide of a story map with text that reads "Misconceptions Some Common Geographic Mental Misplacements..."

The story map shows which three different regions people often misplace in their minds.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Facebook Twitter Delicious Email

Cool Text Data – Music, Law, and News!

Computational text analysis can be done in virtually any field, from biology to literature. You may use topic modeling to determine which areas are the most heavily researched in your field, or attempt to determine the author of an orphan work. Where can you find text to analyze? So many places! Read on for sources to find unique text content.

Woman with microphone

Genius – the song lyrics database

Genius started as Rap Genius, a site where rap fans could gather to annotate and analyze rap lyrics. It expanded to include other genres in 2014, and now manages a massive database covering Ariana Grande to Fleetwood Mac, and includes both lyrics and fan-submitted annotations. All of this text can be downloaded and analyzed using the Genius API. Using Genius and a text mining method, you could see how themes present in popular music changed over recent years, or understand a particular artist’s creative process.

homepage of case.law, with Ohio highlighted, 147,692 unique cases. 31 reporters. 713,568 pages scanned.

Homepage of case.law

Case.law – the case law database

The Caselaw Access Project (CAP) is a fairly recent project that is still ongoing, and publishes machine-readable text digitized from over 40,000 bound volumes of case law from the Harvard Law School Library. The earliest case is from 1658, with the most recent cases from June 2018. An API and bulk data downloads make it easy to get this text data. What can you do with huge amounts of case law? Well, for starters, you can generate a unique case law limerick:

Wheeler, and Martin McCoy.
Plaintiff moved to Illinois.
A drug represents.
Pretrial events.
Rocky was just the decoy.

Check out the rest of their gallery for more project ideas.

Newspapers and More

There are many places you can get text from digitized newspapers, both recent and historical. Some newspaper are hundreds of years old, so there can be problems with the OCR (Optical Character Recognition) that will make it difficult to get accurate results from your text analysis. Making newspaper text machine readable requires special attention, since they are printed on thin paper and have possibly been stacked up in a dusty closet for 60 years! See OCR considerations here, but the newspaper text described here is already machine-readable and ready for text mining. However, with any text mining project, you must pay close attention to the quality of your text.

The Chronicling America project sponsored by the Library of Congress contains digital copies of newspapers with machine-readable text from all over the United States and its territories, from 1690 to today. Using newspaper text data, you can analyze how topics discussed in newspapers change over time, among other things.

newspapers being printed quickly on a rolling press

Looking for newspapers from a different region? The library has contracts with several vendors to conduct text mining, including Gale and ProQuest. Both provide newspaper text suitable for text mining, from The Daily Mail of London (Gale), to the Chinese Newspapers Collection (ProQuest). The way you access the text data itself will differ between the two vendors, and the library will certainly help you navigate the collections. See the Finding Text Data library guide for more information.

The sources mentioned above are just highlights of our text data collection! The Illinois community has access to a huge amount of text, including newspapers and primary sources, but also research articles and books! Check out the Finding Text Data library guide for a more complete list of sources. And, when you’re ready to start your text mining project, contact the Scholarly Commons (sc@library.illinois.edu), and let us help you get started!

Facebook Twitter Delicious Email

Wikidata and Wikidata Human Gender Indicators (WHGI)

Wikipedia is a central player in online knowledge production and sharing. Since its founding in 2001, Wikipedia has been committed to open access and open editing, which has made it the most popular reference work on the web. Though students are still warned away from using Wikipedia as a source in their scholarship, it presents well-researched information in an accessible and ostensibly democratic way.

Most people know Wikipedia from its high ranking in most internet searches and tend to use it for its encyclopedic value. The Wikimedia Foundation—which runs Wikipedia—has several other projects which seek to provide free access to knowledge. Among those are Wikimedia Commons, which offers free photos; Wikiversity, which offers free educational materials; and Wikidata, which provides structured data to support the other wikis.

The Wikidata logo

Wikidata provides structured data to support Wikimedia and other Wikimedia Foundation projects

Wikidata is a great tool to study how Wikipedia is structured and what information is available through the online encyclopedia. Since it is presented as structured data, it can be analyze quantitatively more easily than Wikipedia articles. This has led to many projects that allow users to explore data through visualizations, queries, and other means. Wikidata offers a page of Tools that can be used to analyze Wikidata more quickly and efficiently, as well as Data Access instructions for how to use data from the site.

The webpage for the Wikidata Human Gender Indicators project

The home page for the Wikidata Human Gender Indicators project

An example of a project born out of Wikidata is the Wikidata Human Gender Indicators (WHGI) project. The project uses metadata from Wikidata entries about people to analyze trends in gender disparity over time and across cultures. The project presents the raw data for download, as well as charts and an article written about the discoveries the researchers made while compiling the data. Some of the visualizations they present are confusing (perhaps they could benefit from reading our Lightning Review of Data Visualization for Success), but they succeed in conveying important trends that reveal a bias toward articles about men, as well as an interesting phenomenon surrounding celebrities. Some regions will have a better ratio of women to men biographies due to many articles being written about actresses and female musicians, which reflects cultural differences surrounding fame and gender.

Of course, like many data sources, Wikidata is not perfect. The creators of the WHGI project frequently discovered that articles did not have complete metadata related to gender or nationality, which greatly influenced their ability to analyze the trends present on Wikipedia related to those areas. Since Wikipedia and Wikidata are open to editing by anyone and are governed by practices that the community has agreed upon, it is important for Wikipedians to consider including more metadata in their articles so that researchers can use that data in new and exciting ways.

An animated gif of the Wikipedia logo bouncing like a ball

Facebook Twitter Delicious Email

New Uses for Old Technology at the Arctic World Archive

In this era of rapid technological change, it is easy to fall into the mindset that the “big new thing” is always an improvement on the technology that came before it. Certainly this is often true, and here in the Scholarly Commons we are always seeking innovative new tools to help you out with your research. However, every now and then it’s nice to just slow down and take the time to appreciate the strengths and benefits of older technology that has largely fallen out of use.

A photo of the arctic

There is perhaps no better example of this than the Arctic World Archive, a facility on the Norwegian archipelago of Svalbard. Opened in 2017, the Arctic World Archive seeks to preserve the world’s most important cultural, political, and literary works in a way that will ensure that no manner of catastrophe, man-made or otherwise, could destroy them.

If this is all sounding familiar to you, that’s because you’ve probably heard of the Arctic World Archive’s older sibling, the Svalbard Global Seed Vault. The Global Seed Vault, which is much better known and older than the Arctic World Archive, is an archive seeds from around the world, meant to ensure that humanity would be able to continue growing crops and making food in the event of a catastrophe that wipes out plant life.

Indeed, the two archives have a lot in common. The World Archive is housed deep within a mountain in an abandoned coal mine that once served as the location of the seed vault, and was founded to be for cultural heritage what the seed vault is for crops. But the Arctic World Archive has made truly innovative use of old technology that makes it a truly impressive site in its own right.

A photo of the arctic

Perhaps the coolest (pun intended) aspect of the Arctic World Archive is the fact that it does not require electricity to operate. It’s extreme northern location (it is near the northernmost town of at least 1,000 people in the world) means that the temperature inside the facility is naturally very cold year-round. As any archivist or rare book librarian who brings a fleece jacket to work in the summer will happily tell you, colder temperatures are ideal for preserving documents, and the ability to store items in a very cold climate without the use of electricity makes the World Archive perfect for sustainable, long-term storage.

But that’s not all: in a real blast from the past, all information stored in this facility is kept on microfilm. Now, I know what you’re thinking: “it’s the 21st century, grandpa! No one uses microfilm anymore!”

It’s true that microfilm is used by a very small minority of people nowadays, but nevertheless it offers distinct advantages that newer digital media just can’t compete with. For example, microfilm is rated to last for at least 500 years without corruption, whereas digital files may not last anywhere near that long. Beyond that, the film format means that the archive is totally independent from the internet, and will outlast any major catastrophe that disrupts part or all of our society’s web capabilities.

A photo of a seal

The Archive is still growing, but it is already home to film versions of Edvard Munch’s The Scream, Dante’s The Divine Comedy, and an assortment of government documents from many countries including Norway, Brazil, and the United States.

As it continues to grow, its importance as a place of safekeeping for the world’s cultural heritage will hopefully serve as a reminder that sometimes, older technology has upsides that new tech just can’t compete with.

Facebook Twitter Delicious Email

Exploring Data Visualization #9

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Map of election districts colored red or blue based on predicted 2018 midterm election outcome

This map breaks down likely outcomes of the 2018 Midterm elections by district.

 

Seniors at Montgomery Blair High School in Silver Spring, Maryland created the ORACLE of Blair 2018 House Election Forecast, a website that hosts visualizations that predict outcomes for the 2018 Midterm Elections. In addition to breakdowns of voting outcome by state and district, the students compiled descriptions of how the district has voted historically and what are important stances for current candidates. How well do these predictions match up with the results from Tuesday?

A chart showing price changes for 15 items from 1998 to 2018

This chart shows price changes over the last 20 years. It gives the impression that these price changes are always steady, but that isn’t the case for all products.

Lisa Rost at Datawrapper created a chart—building on the work of Olivier Ballou—that shows the change in the price of goods using the Consumer Price Index. She provides detailed coverage of how her chart is put together, as well as making clear what is missing from both hers and Ballou’s chart based on what products are chosen to show on the graph. This behind-the-scenes information provides useful advise for how to read and design charts that are clear and informative.

An image showing a scale of scientific visualizations from figurative on the left to abstract on the right.

There are a lot of ways to make scientific research accessible through data visualization.

Visualization isn’t just charts and graphs—it’s all manner of visual objects that contribute information to a piece. Jen Christiansen, the Senior Graphics Editor at Scientific American, knows this well, and her blog post “Visualizing Science: Illustration and Beyond” on Scientific American covers some key elements of what it takes to make engaging and clear scientific graphics and visualizations. She shares lessons learned at all levels of creating visualizations, as well as covering a few ways to visualize uncertainty and the unknown.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Facebook Twitter Delicious Email

Election Forecasts and the Importance of Good Data Visualization

In the wake of the 2016 presidential election, many people, on the left and right alike, came together on the internet to express a united sentiment: that the media had called the election wrong. In particular, one man may have received the brunt of this negative attention. Nate Silver and his website FiveThirtyEight have taken nearly endless flak from disgruntled Twitter users over the past two years for their forecast which gave Hillary Clinton a 71.4% chance of winning.

However, as Nate Silver has argued in many articles and tweets, he did not call the race “wrong” at all, everyone else just misinterpreted his forecast. So what really happened? How could Nate Silver say that he wasn’t wrong when so many believe to this day that he was? As believers in good data visualization practice, we here in the Scholarly Commons can tell you that if everyone interprets your data to mean one thing when you really meant it to convey something else entirely, your visualization may be the problem.

Today is Election Day, and once again, FiveThirtyEight has new models out forecasting the various House, Senate, and Governors races on the ballot. However, these models look quite a bit different from 2016’s, and in those differences lie some important data viz lessons. Let’s dive in and see what we can see!

The image above is a screenshot taken from the very top of the page for FiveThirtyEight’s 2016 Presidential Election Forecast, which was last updated on the morning of Election Day 2016. The image shows a bar across the top, filled in blue 71.4% of the way, to represent Clinton’s chance of winning, and red the rest of the 28.6% to represent Trump’s chance of winning. Below this bar is a map of the fifty states, colored from dark red to light red to light blue to dark blue, representative of the percentage chance that each state goes for one of the two candidates.

The model also allows you to get a sense of where exactly each state stands, by hovering your cursor over a particular state. In the above example, we can see a bar similar the one at the top of the national forecast which shows Clinton’s 55.1% chance of winning Florida.

The top line of FiveThirtyEight’s 2018 predictions looks quite a bit different. When you open the House or Senate forecasts, the first thing you see is a bell curve, not a map, as exemplified by the image of the House forecast below.

At first glance, this image may be more difficult to take in than a simple map, but it actually contains a lot of information that is essential to anyone hoping to get a sense of where the election stands. First, the top-line likelihood of each party taking control is expressed as a fraction, rather than as a percent. The reasoning behind this is that some feel that the percent bar from the 2016 model improperly gave the sense that Clinton’s win was a sure thing. The editors at FiveThirtyEight hope that fractions will do a better job than percentages at conveying that the forecasted outcome is not a sure thing.

Beyond this, the bell curve shows forecasted percentage chances for every possible outcome (for example, at the time of writing, this, there is a 2.8% chance that Democrats gain 37 seats, a 1.6% chance that Democrats gain 20 seats, a <0.1% chance that Democrats gain 97 seats, and a <0.1% chance that Republicans gain 12 seats. This visualization shows the inner workings of how the model makes its prediction. Importantly, it strikes home the idea that any result could happen even if one end result is considered more likely. What’s more, the model features a gray rectangle centered around the average result, that highlights the middle 80% of the forecast: there is an 80% chance that the result will be between a Democratic gain of 20 seats (meaning Republicans would hold the House) and a Democratic gain of 54 (a so-called “blue wave”).

The 2018 models do feature maps as well, such as the above map for the Governors forecast. But some distinct changes have been made. First, you have to scroll down to get to the map, hopefully absorbing some important information from the graphs at the top in the meantime. Most prominently, FiveThirtyEight has re-thought the color palette they are using. Whereas the 2016 forecast only featured shades of red and blue, this year the models use gray (House) and white (Senate and Governors) to represent toss-ups and races that only slightly lean one way or the other. If this color scheme had been used in 2016, North Carolina and Florida, both states that ended up going for Trump but were colored blue on the map, would have been much more accurately depicted not as “blue states” but as toss-ups.

Once again, hovering over a state or district gives you a detail of the forecast for that place in particular, but FiveThirtyEight has improved that as well.

Here we can see much more information than was provided in the hover-over function for the 2016 map. Perhaps most importantly, this screen shows us the forecasted vote share for each candidate, including the average, high, and low ends of the prediction. So for example, from the above screenshot for Illinois’ 13th Congressional District (home to the University of Illinois!) we can see that Rodney Davis is projected to win, but there is a very real scenario in which Betsy Dirksen Londrigan ends up beating him.

FiveThirtyEight did not significantly change how their models make predictions between 2016 and this year. The data itself is treated in roughly the same way. But as we can see from these comparisons, the way that this data is presented can make a big difference in terms of how we interpret it. 

Will these efforts at better data visualization be enough to deter angry reactions to how the model correlates with actual election results? We’ll just have to tune in to the replies on Nate Silver’s twitter account tomorrow morning to find out… In the meantime, check out their House, Senate, and Governors  forecasts for yourself!

 

All screenshots taken from fivethirtyeight.com. Images of the 2016 models reflect the “Polls-only” forecast. Images of the 2018 models reflect the “Classic” forecasts as of the end of the day on November 5th 2018.

Facebook Twitter Delicious Email

Lightning Review: Data Visualization for Success

Data visualization is where the humanities and sciences meet: viewers are dazzled by the presentation yet informed by research. Lovingly referred to as “the poster child of interdisciplinarity” by Steven Braun, data visualization brings these two fields closer together than ever to help provide insights that may have been impossible without the other. In his book Data Visualization for Success, Braun sits down with forty designers with experience in the field to discuss their approaches to data visualization, common techniques in their work, and tips for beginners.

Braun’s collection of interviews provides an accessible introduction into data visualization. Not only is the book filled with rich images, but each interview is short and meant to offer an individual’s perspective on their own work and the field at large. Each interview begins with a general question about data visualization to contribute to the perpetual debate of what data visualization is and can be moving forward.

Picture of Braun's "Data Visualization for Success"

Antonio Farach, one of the designers interviewed in the book, calls data visualization “the future of storytelling.” And when you see his work – or really any of the work in this book – you can see why. Each new image has an immediate draw, but it is impossible to move past without exploring a rich narrative. Visualizations in this book cover topics ranging from soccer matches to classic literature, economic disparities, selfie culture, and beyond.

Each interview ends by asking the designer for their advice to beginners, which not only invites new scholars and designers to participate in the field but also dispels any doubt of the hard work put in by these designers or the science at the root of it all. However, Barbara Hahn and Christine Zimmermann of Han+Zimmermann may have put it best, “Data visualization is not making boring data look fancy and interesting. Data visualization is about communicating specific content and giving equal weight to information and aesthetics.”

A leisurely, stunning, yet informative read, Data Visualization for Success offers anyone interested in this explosive field an insider’s look from voices around the world. Drop by the Scholarly Commons during our regular hours to flip through this wonderful read.

And finally, if you have any further interest in data visualization make sure you stay up to date on our Exploring Data Visualization series or take a look at what services the Scholarly Commons provides!

Facebook Twitter Delicious Email