Stata vs. R vs. SPSS for Data Analysis

As you do research with larger amounts of data, it becomes necessary to graduate from doing your data analysis in Excel and find a more powerful software. It can seem like a really daunting task, especially if you have never attempted to analyze big data before. There are a number of data analysis software systems out there, but it is not always clear which one will work best for your research. The nature of your research data, your technological expertise, and your own personal preferences are all going to play a role in which software will work best for you. In this post I will explain the pros and cons of Stata, R, and SPSS with regards to quantitative data analysis and provide links to additional resources. Every data analysis software I talk about in this post is available for University of Illinois students, faculty, and staff through the Scholarly Commons computers and you can schedule a consultation with CITL if you have specific questions.

Short video loop of a kid sitting at a computer and putting on sun glasses

Rock your research with the right tools!


STATA

Stata logo. Blue block lettering spelling out Stata.

Among researchers, Stata is often credited as the most user-friendly data analysis software. Stata is popular in the social sciences, particularly economics and political science. It is a complete, integrated statistical software package, meaning it can accomplish pretty much any statistical task you need it to, including visualizations. It has both a point-and-click user interface and a command line function with easy-to-learn command syntax. Furthermore, it has a system for version-control in place, so you can save syntax from certain jobs into a “do-file” to refer to later. Stata is not free to have on your personal computer. Unlike an open-source program, you cannot program your own functions into Stata, so you are limited to the functions it already supports. Finally, its functions are limited to numeric or categorical data, it cannot analyze spatial data and certain other types.

 

Pros

Cons

User friendly and easy to learn An individual license can cost
between $125 and $425 annually
Version control Limited to certain types of data
Many free online resources for learning You cannot program new
functions into Stata

Additional resources:


R logo. Blue capital letter R wrapped with a gray oval.

R and its graphical user interface companion R Studio are incredibly popular software for a number of reasons. The first and probably most important is that it is a free open-source software that is compatible with any operating system. As such, there is a strong and loyal community of users who share their work and advice online. It has the same features as Stata such as a point-and-click user interface, a command line, savable files, and strong data analysis and visualization capabilities. It also has some capabilities Stata does not because users with more technical expertise can program new functions with R to use it for different types of data and projects. The problem a lot of people run into with R is that it is not easy to learn. The programming language it operates on is not intuitive and it is prone to errors. Despite this steep learning curve, there is an abundance of free online resources for learning R.

Pros

Cons

Free open-source software Steep learning curve
Strong online user community Can be slow
Programmable with more functions
for data analysis

Additional Resources:

  • Introduction to R Library Guide: Find valuable overviews and tutorials on this guide published by the University of Illinois Library.
  • Quick-R by DataCamp: This website offers tutorials and examples of syntax for a whole host of data analysis functions in R. Everything from installing the package to advanced data visualizations.
  • Learn R on Code Academy: A free self-paced online class for learning to use R for data science and beyond.
  • Nabble forum: A forum where individuals can ask specific questions about using R and get answers from the user community.

SPSS

SPSS logo. Red background with white block lettering spelling SPSS.

SPSS is an IBM product that is used for quantitative data analysis. It does not have a command line feature but rather has a user interface that is entirely point-and-click and somewhat resembles Microsoft Excel. Although it looks a lot like Excel, it can handle larger data sets faster and with more ease. One of the main complaints about SPSS is that it is prohibitively expensive to use, with individual packages ranging from $1,290 to $8,540 a year. To make up for how expensive it is, it is incredibly easy to learn. As a non-technical person I learned how to use it in under an hour by following an online tutorial from the University of Illinois Library. However, my take on this software is that unless you really need a more powerful tool just stick to Excel. They are too similar to justify seeking out this specialized software.

Pros

Cons

Quick and easy to learn By far the most expensive
Can handle large amounts of data Limited functionality
Great user interface Very similar to Excel

Additional Resources:

Gif of Kermit the frog dancing and flailing his arms with the words "Yay Statistics" in block letters above

Thanks for reading! Let us know in the comments if you have any thoughts or questions about any of these data analysis software programs. We love hearing from our readers!

 

Featured Resource: QGIS, a Free, Open Source Mapping Platform

This week, geographers around the globe took some time to celebrate the software that allows them to analyze, well, that very same globe. November 13th marked the 20th annual GIS Day,  an “international celebration of geographic information systems,” as the official GIS Day website puts it.

the words "GIS day" in a stylized font appear below a graphic of a globe with features including buildings, trees, and water

But while GIS technology has revolutionized the way we analyze and visualize maps over the past two decades, the high cost of ArcGIS products, long recognized as the gold standard for cartographic analysis tools, is enough to deter many people from using it. At the University of Illinois and other colleges and universities, access to ArcGIS can be taken for granted, but many of us will not remain in the academic world forever. Luckily, there’s a high-quality alternative to ArcGIS for those who want the benefits of mapping software without the pricetag!

the QGIS logo

QGIS is a free, open source mapping software that has most of the same functionality as ArcGIS. While some more advanced features included in ArcGIS do not have analogues in QGIS, developers are continually updating the software and new features are always being added. As it stands now, though, QGIS includes everything that the casual GIS practitioner could want, along with almost everything more advanced users need.

As is often the case with open source software alternatives, QGIS has a large, vibrant community of supporters, and its developers have put together tons of documentation on how to use the program, such as this user guide. Generally speaking, if you have any experience with ArcGIS it’s very easy to learn QGIS—for a picture of the learning curve, think somewhere along the lines of switching from Microsoft Word to Google Docs. And if you don’t have experience, the community is there to help! There are many guides to getting started, including the one listed in the above link, and more forum posts of users working through questions together than anyone could read in a lifetime. 

For more help, stop by to take a look at one of the QGIS guidebooks in our reference collection, or send us an email at sc@library.illinois.edu!

Have you made an interesting map in QGIS? Send us pictures of your creations on Twitter @ScholCommons!

 

A Brief Explanation of GitHub for Non-Software-Developers

GitHub is a platform mostly used by software developers for collaborative work. You might be thinking “I’m not a software developer, what does this have to do with me?” Don’t go anywhere! In this post I explain what GitHub is and how it can be applied to collaborative writing for non-programmers. Who knows, GitHub might become your new best friend.

Gif of a cat typing

You don’t need to be a computer wiz to get Git.

Picture this: you and some colleagues have similar research interests and want to collaborate on a paper. You have divided the writing work to allow each of you to work on a different element of the paper. Using a cloud platform like Google Docs or Microsoft Word online you compile your work, but things start to get messy. Edits are made on the document and you are unsure who made them or why. Elements get deleted and you do not know how to retrieve your previous work. You have multiple files saved on your computer with names like “researchpaper1.dox”, “researchpaper1 with edits.dox” and “research paper1 with new edits.dox”. Managing your own work is hard enough but when collaborators are added to the mix it just becomes unmanageable. After a never ending reply-all email chain and what felt like the longest meeting of all time, you and your colleagues are finally on the same page about the writing and editing of your paper. It just makes you think, there has got to be a better way to do this. Issues with collaboration are not exclusive to writing, they happen all the time in programming, which is why software-developers came up with version control systems like Git and GitHub.

Gif of Spongebob running around an office on fire with paper and filing cabinets on the floor

Managing versions of your work can be stressful. Don’t panic because GitHub can help.

GitHub allows developers to work together through branching and merging. Branching is the process by which the original file or source code is duplicated into clone files. These clones contain all the elements already in the original file and can be worked in independently. Developers use these clones to write and test code before combining it with the original code. Once their version of the code is ready they integrate or “push” it into the source code in a process called merging. Then, other members of the team are alerted of these changes and can “pull” the merged code from the source code into their respective clones. Additionally, every version of the project is saved after changes are made, allowing users to consult previous versions. Every version of your project is saved with with descriptions of what changes were made in that particular version, these are called commits. Now, this is a simplified explanation of what GitHub does but my hope is that you now understand GitHub’s applications because what I am about to say next might blow your mind: GitHub is not just for programmers! You do not need to know any coding to work with GitHub. After all, code and written language are very similar.

Even if you cannot write a single line of code, GitHub can be incredibly useful for a variety of reasons:
1. It allows you to electronically backup your work for free.
2. All the different versions of your work are saved separately, allowing you to look back at previous edits.
3. It alerts all collaborators when a change is made and they can merge that change into their own versions of the text.
4. It allows you to write using plain text, something commonly requested by publishers.

Hopefully, if you’ve made it this far into the article you’re thinking, “This sounds great, let’s get started!” For more information on using GitHub you can consult the Library’s guide on GitHub or follow the step by step instructions on GitHub’s Hello-World Guide.

Gif of man saying "check it out" and pointing to the right.

There are many resources on getting started with GitHub. Check them out!

Here are some links to what others have said about using GitHub for non-programmers:

What’s In A Name?: From Lynda.com to LinkedIn Learning

LinkedIn Learning Logo

Lynda.com had a long history with libraries. The online learning platform offered video courses to help people “learn business, software, technology and creative skills to achieve personal and professional goals.Lynda.com paired well with other library services and collections, offering library users the chance to learn new skills at their own pace in an accessible and varied medium. 

However, in 2015—twenty years after its initial launch—Lynda.com⁠⁠ was purchased by LinkedIn. A year later, Microsoft purchased LinkedIn for $26.2 billion. And now, in 2019, Lynda.com content is available through the newly-formed LinkedIn Learning.

Charmander evolving into Charmeleon

Sometimes, evolution is simple (like when it gets you one step closer to an Elite-Four-wrecking Charizard). Sometimes, it’s a little more complicated (like when Microsoft buys LinkedIn which just bought Lynda.com).

The good news is that this change from Lynda.com to LinkedIn Learning includes access to all of the same content previously available. This means that, through the University Library’s subscription, you still have access to courses on software like R, SQL, Tableu, Python, InDesign, Photoshop, and more (many of which are available to use on campus at the Scholarly Commons). There are also courses on broader, related topics like data science, database management, and user experience

Setting up your own personal account to access LinkedIn Learning is where things get just a little trickier. As a result of the transition from Lynda.com to LinkedIn Learning, users are now strongly encouraged to link their personal LinkedIn accounts with their LinkedIn Learning accounts. Completing courses in LinkedIn Learning will earn you badges that are automatically carried over to your LinkedIn account. However, this additional step—using a personal LinkedIn account to access these course—also makes the information about your LinkedIn Learning as public as your LinkedIn profile. Because Lynda.com only required a library card and PIN, this change in privacy has received push-back from libraries and library organizations across the country.

Obi-Wan Kenobi looking confused with caption reading [visible confusion]

This new policy change doesn’t mean you should avoid LinkedIn Learning, it just means you should use it with care and make an informed decision about your privacy settings. Maybe you want potential employers to see what you’re proactively learning about on the platform, maybe you to keep that information private. Either way, you can get details on setting up accounts and your privacy settings by consulting this guide created by Technology Services.

LinkedIn Learning can be accessed through the University Library here.

February Push!

Hello, researchers!

Congratulations! You made it through your first month back of the spring semester. From class work, to pouring rain, to enough snow and ice and make the university look like it’s auditioning for a role as Antarctica, you’re pushing forward!

A dual-monitor computer in the Scholarly Commons. The background of the image shows the Scholarly Commons space, which is filled with out dual-monitor computers and various desks.

Take a minute to look over all the awesome resources we have, right here in the Scholarly Commons, to help you keep chugging along with your research.

We are open 8:30 a.m. to 6 p.m., Monday through Friday. Our various, dual monitor computers have software ranging from Adobe Photoshop to OCR which can be paired with our various scanners to make machine readable PDFs!

The Scholarly Commons space. A desk with a computer and a sign reading "Scholarly Commons" is shown.

Researchers can book free consultations thanks to our partnerships with CITL Data Analytics and Technology Services! In these meetings, you can learn about R, SAS, and everything else you need to just get started or to get past that tricky problem in your statistical research.

Beyond that, users can make appoints with our GIS specialist, and learn even more through our GIS resources. We have a ton of great books in our non-circulating reference collection that can help you learn about Python, GIS, and more!

The Scholarly Common reference collection. Six shelves filled with books.

 

And that’s not all: our Data Analytics & Visualization Librarian has put together a plethora of resources to help turn your data into art. Check out the four most common types of charts guide to get started!

The Scholarly Commons space. it contains several workstations with a carpeted floor.

And even this doesn’t cover all of our services!

If you need assistance finding numeric data, understanding your copyrights, cleaning up data in OpenRefine, or even starting up a project using text mining, we have the resources you need.

The Scholarly Commons has all the resources you need to succeed, so stop by anytime! We’re always happy to help.

Google MyMaps Part II: The Problem with Projections

Back in October, we published a blog post introducing you to Google MyMaps, an easy way to display simple information in map form. Today we’re going to revisit that topic and explore some further ways in which MyMaps can help you visualize different kinds of data!

One of the most basic things that students of geography learn is the problem of projections: the earth is a sphere, and there is no perfect way to translate an image from the surface of a sphere to a flat plane. Nevertheless, cartographers over the years have come up with many projection systems which attempt to do just that, with varying degrees of success. Google Maps (and, by extension, Google MyMaps) uses perhaps the most common of these, the Mercator projectionDespite its ubiquity, the Mercator projection has been criticized for not keeping area uniform across the map. This means that shapes far away from the equator appear to be disproportionately larger in comparison with shapes on the equator.

Luckily, MyMaps provides a method of pulling up the curtain on Mercator’s distortion. The “Draw a line” tool,  , located just below the search bar at the top of the MyMaps screen, allows users to create a rough outline of any shape on the map, and then drag that outline around the world to compare its size. Here’s how it works: After clicking on “Draw a line,” select “Add line or shape” and begin adding points to the map by clicking. Don’t worry about where you’re adding your points just yet, once you’ve created a shape you can move it anywhere you’d like! Once you have three or four points, complete the polygon by clicking back on top of your first point, and you should have a shape that looks something like this:

A block drawn in MyMaps and placed over Illinois

Now it’s time to create a more detailed outline. Click and drag your shape over the area you want to outline, and get to work! You can change the size of your shape by dragging on the points at the corners, and you can add more points by clicking and dragging on the transparent circles located midway between each corner. For this example, I made a rough outline of Greenland, as you can see below.

Area of Greenland made in MyMaps

You can get as detailed as you want with the points on your shapes, depending on how much time you want to spend clicking and dragging points around on your computer screen. Obviously I did not perfectly trace the exact coastline of Greenland, but my finished product is at least recognizable enough. Now for the fun part! Click somewhere inside the boundary of your shape, drag it somewhere else on the map, and see Mercator’s distortion come to life before your eyes.

Area of Greenland placed over Africa

Here you can see the exact same shape as in the previous image, except instead of hovering over Greenland at the north end of the map, it is placed over Africa and the equator. The area of the shape is exactly the same, but the way it is displayed on the map has been adjusted for the relative distortion of the particular position it now occupies on the map. If that hasn’t sufficiently shaken your understanding of our planet, MyMaps has one more tool for illuminating the divide between the map and reality. The “Measure distances and areas” tool, , draws a “straight” line between any two (or more) points on the map. “Straight” is in quotes there because, as we’re about to see, a straight line on the globe (and therefore in reality) doesn’t typically align with straight lines on the map. For example, if I wanted to see the shortest distance between Chicago and Frankfurt, Germany, I could display that with the Measure tool like so:

Distance line, Chicago to Frankfurt, Germany

The curve in this line represents the curvature of the earth, and demonstrates how the actual shortest distance is not the same as a straight line drawn on the map. This principle is made even more clear through using the Measure tool a little farther north.

Distance line, Chicago to Frankfurt, Germany, set over Greenland

The beginning and ending points of this line are roughly directly north of Chicago and Frankfurt, respectively, however we notice two differences between this and the previous measurement right away. First, this is showing a much shorter distance than Chicago to Frankfurt, and second, the curve in the line is much more distinct. Both of these differences arise, once again, from the difficulty of displaying a sphere on a flat surface. Actual distances get shorter the closer you get to the north (or south) ends of the map, which in turn causes all of the distortions we have seen in this post.

How might a better understanding of projection systems improve your own research? What are some other ways in which the Mercator projection (or any other) have deceived us? Explore for yourself and let us know!

Google Scholar: Friend or Foe?

This is a guest blog by the amazing Zachary Maiorana, a GA in Scholarly and Communication Publishing

Homepage for Google Scholar

Homepage for Google Scholar

Scholars and users have a vested interest in understanding the relative authority of publications they have either written or wish to cite to form the basis of their research. Although the literature search, a common topic in library instruction and research seminars, can take place on a huge variety of discovery tools, researchers often rely on Google Scholar as a supporting or central platform.

The massive popularity of Google Scholar is likely due to its simple interface, which bears the longtime prestige of Google’s search engine; its enormous breadth, with a simple search yielding millions of results; its compatibility and parallels with other Googles Chrome and Books; and its citation metrics mechanism.

This last aspect of Google Scholar, which collects and reports data on the number of citations a given publication receives, represents the platform’s apparent ability to precisely calculate the research community’s interest in that publication. But, in the University Library’s work on the Illinois Experts (experts.illinois.edu) research and scholarship portal, we have encountered a number of circumstances in which Google Scholar has misrepresented U of I faculty members’ research.

Recent studies reveal that Google Scholar, despite its popularity and its massive reach, is not only often inaccurate in its reporting of citation metrics and title attribution, but also susceptible to deliberate manipulation. In 2010, Labbé discusses an experiment using Ike Antkare (AKA “I can’t care”), a fictitious researcher whose bibliography was manufactured with a mountain of self-referencing citations. After the purposely falsified publications went public, Google’s bots didn’t differentiate Antkare’s research from his real-life peers during their crawling of his 100 generated articles. As a result, Google Scholar reported Antkare as one of the most cited researchers in the world, with a higher H-index* than Einstein.

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

Ike Antkare “standing on the shoulders of giants” in Indiana University’s Scholarometer. Credit: Adapted from a screencap in Labbé (2010)

In 2014, Spanish researchers conducted an experiment in which they created a fake scholar with several papers making hundreds of references to works written by the experimenters. After the papers were made public on a personal site, Google Scholar scraped the data and the real-life researchers’ profiles increased by 774 citations in total. In the hands of more nefarious users seeking to aggrandize their own careers or alter scientific opinion, such practices could result in large-scale academic fraud.

For libraries, Google’s kitchen-sink-included data collection methods further result in confusing and inaccurate attributions. In our work to supplement the automated collection of publication data for faculty profiles on Illinois Experts using CVs, publishers’ sites, journal sites, databases, and Google Scholar, we frequently encounter researchers’ names and works mischaracterized by Google’s clumsy aggregation mechanisms. For example, Google Scholar’s bots often read a scholar’s name somewhere within a work that the scholar hasn’t written—perhaps they were mentioned in the acknowledgements or in a citation—and simply attribute the work to them as author.

When it comes to people’s careers and the sway of scientific opinion, such snowballing mistakes can be a recipe for large-scale misdirection. Though much research exists that shows that, in general, Google Scholar currently represents highly cited research well, weaknesses persist. Blind distrust of any dominant proprietary platform is unwise, and using Google Scholar requires particularly careful judgment.

Read more on Google Scholar’s quality and reliability:

Brown, Christopher C. 2017. “Google Scholar.” The Charleston Advisor 19 (2): 31–34. https://doi.org/10.5260/chara.19.2.31.

Halevi, Gali, Henk Moed, and Judit Bar-Ilan. 2017. “Suitability of Google Scholar as a Source of Scientific Information and as a Source of Data for Scientific Evaluation—Review of the Literature.” Journal of Informetrics 11 (3): 823–34. https://doi.org/10.1016/j.joi.2017.06.005.

Labbé, Cyril. 2016. “L’histoire d’Ike Antkare et de Ses Amis Fouille de Textes et Systèmes d’information Scientifique.” Document Numérique 19 (1): 9–37. https://doi.org/10.3166/dn.19.1.9-37.

Lopez-Cozar, Emilio Delgado, Nicolas Robinson-Garcia, and Daniel Torres-Salinas. 2012. “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting.” ArXiv:1212.0638 [Cs], December. http://arxiv.org/abs/1212.0638.

Walker, Lizzy A., and Michelle Armstrong. 2014. “‘I Cannot Tell What the Dickens His Name Is’: Name Disambiguation in Institutional Repositories.” Journal of Librarianship and Scholarly Communication 2 (2). https://doi.org/10.7710/2162-3309.1095.

*Read the library’s LibGuide on bibliometrics for an explanation of the h-index and other standard research metrics: https://guides.library.illinois.edu/c.php?g=621441&p=4328607

Wikidata and Wikidata Human Gender Indicators (WHGI)

Wikipedia is a central player in online knowledge production and sharing. Since its founding in 2001, Wikipedia has been committed to open access and open editing, which has made it the most popular reference work on the web. Though students are still warned away from using Wikipedia as a source in their scholarship, it presents well-researched information in an accessible and ostensibly democratic way.

Most people know Wikipedia from its high ranking in most internet searches and tend to use it for its encyclopedic value. The Wikimedia Foundation—which runs Wikipedia—has several other projects which seek to provide free access to knowledge. Among those are Wikimedia Commons, which offers free photos; Wikiversity, which offers free educational materials; and Wikidata, which provides structured data to support the other wikis.

The Wikidata logo

Wikidata provides structured data to support Wikimedia and other Wikimedia Foundation projects

Wikidata is a great tool to study how Wikipedia is structured and what information is available through the online encyclopedia. Since it is presented as structured data, it can be analyze quantitatively more easily than Wikipedia articles. This has led to many projects that allow users to explore data through visualizations, queries, and other means. Wikidata offers a page of Tools that can be used to analyze Wikidata more quickly and efficiently, as well as Data Access instructions for how to use data from the site.

The webpage for the Wikidata Human Gender Indicators project

The home page for the Wikidata Human Gender Indicators project

An example of a project born out of Wikidata is the Wikidata Human Gender Indicators (WHGI) project. The project uses metadata from Wikidata entries about people to analyze trends in gender disparity over time and across cultures. The project presents the raw data for download, as well as charts and an article written about the discoveries the researchers made while compiling the data. Some of the visualizations they present are confusing (perhaps they could benefit from reading our Lightning Review of Data Visualization for Success), but they succeed in conveying important trends that reveal a bias toward articles about men, as well as an interesting phenomenon surrounding celebrities. Some regions will have a better ratio of women to men biographies due to many articles being written about actresses and female musicians, which reflects cultural differences surrounding fame and gender.

Of course, like many data sources, Wikidata is not perfect. The creators of the WHGI project frequently discovered that articles did not have complete metadata related to gender or nationality, which greatly influenced their ability to analyze the trends present on Wikipedia related to those areas. Since Wikipedia and Wikidata are open to editing by anyone and are governed by practices that the community has agreed upon, it is important for Wikipedians to consider including more metadata in their articles so that researchers can use that data in new and exciting ways.

An animated gif of the Wikipedia logo bouncing like a ball

New Uses for Old Technology at the Arctic World Archive

In this era of rapid technological change, it is easy to fall into the mindset that the “big new thing” is always an improvement on the technology that came before it. Certainly this is often true, and here in the Scholarly Commons we are always seeking innovative new tools to help you out with your research. However, every now and then it’s nice to just slow down and take the time to appreciate the strengths and benefits of older technology that has largely fallen out of use.

A photo of the arctic

There is perhaps no better example of this than the Arctic World Archive, a facility on the Norwegian archipelago of Svalbard. Opened in 2017, the Arctic World Archive seeks to preserve the world’s most important cultural, political, and literary works in a way that will ensure that no manner of catastrophe, man-made or otherwise, could destroy them.

If this is all sounding familiar to you, that’s because you’ve probably heard of the Arctic World Archive’s older sibling, the Svalbard Global Seed Vault. The Global Seed Vault, which is much better known and older than the Arctic World Archive, is an archive seeds from around the world, meant to ensure that humanity would be able to continue growing crops and making food in the event of a catastrophe that wipes out plant life.

Indeed, the two archives have a lot in common. The World Archive is housed deep within a mountain in an abandoned coal mine that once served as the location of the seed vault, and was founded to be for cultural heritage what the seed vault is for crops. But the Arctic World Archive has made truly innovative use of old technology that makes it a truly impressive site in its own right.

A photo of the arctic

Perhaps the coolest (pun intended) aspect of the Arctic World Archive is the fact that it does not require electricity to operate. It’s extreme northern location (it is near the northernmost town of at least 1,000 people in the world) means that the temperature inside the facility is naturally very cold year-round. As any archivist or rare book librarian who brings a fleece jacket to work in the summer will happily tell you, colder temperatures are ideal for preserving documents, and the ability to store items in a very cold climate without the use of electricity makes the World Archive perfect for sustainable, long-term storage.

But that’s not all: in a real blast from the past, all information stored in this facility is kept on microfilm. Now, I know what you’re thinking: “it’s the 21st century, grandpa! No one uses microfilm anymore!”

It’s true that microfilm is used by a very small minority of people nowadays, but nevertheless it offers distinct advantages that newer digital media just can’t compete with. For example, microfilm is rated to last for at least 500 years without corruption, whereas digital files may not last anywhere near that long. Beyond that, the film format means that the archive is totally independent from the internet, and will outlast any major catastrophe that disrupts part or all of our society’s web capabilities.

A photo of a seal

The Archive is still growing, but it is already home to film versions of Edvard Munch’s The Scream, Dante’s The Divine Comedy, and an assortment of government documents from many countries including Norway, Brazil, and the United States.

As it continues to grow, its importance as a place of safekeeping for the world’s cultural heritage will hopefully serve as a reminder that sometimes, older technology has upsides that new tech just can’t compete with.

Exploring Data Visualization #9

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Map of election districts colored red or blue based on predicted 2018 midterm election outcome

This map breaks down likely outcomes of the 2018 Midterm elections by district.

 

Seniors at Montgomery Blair High School in Silver Spring, Maryland created the ORACLE of Blair 2018 House Election Forecast, a website that hosts visualizations that predict outcomes for the 2018 Midterm Elections. In addition to breakdowns of voting outcome by state and district, the students compiled descriptions of how the district has voted historically and what are important stances for current candidates. How well do these predictions match up with the results from Tuesday?

A chart showing price changes for 15 items from 1998 to 2018

This chart shows price changes over the last 20 years. It gives the impression that these price changes are always steady, but that isn’t the case for all products.

Lisa Rost at Datawrapper created a chart—building on the work of Olivier Ballou—that shows the change in the price of goods using the Consumer Price Index. She provides detailed coverage of how her chart is put together, as well as making clear what is missing from both hers and Ballou’s chart based on what products are chosen to show on the graph. This behind-the-scenes information provides useful advise for how to read and design charts that are clear and informative.

An image showing a scale of scientific visualizations from figurative on the left to abstract on the right.

There are a lot of ways to make scientific research accessible through data visualization.

Visualization isn’t just charts and graphs—it’s all manner of visual objects that contribute information to a piece. Jen Christiansen, the Senior Graphics Editor at Scientific American, knows this well, and her blog post “Visualizing Science: Illustration and Beyond” on Scientific American covers some key elements of what it takes to make engaging and clear scientific graphics and visualizations. She shares lessons learned at all levels of creating visualizations, as well as covering a few ways to visualize uncertainty and the unknown.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.