Introductions: What is Data Analysis, anyway?

This post is part of a series where we introduce you to the various topics that we cover in the Scholarly Commons. Maybe you’re new to the field or you’re just to the point where you’re just too afraid to ask… Fear not! We are here to take it back to the basics!

So, what is Data Analysis, anyway?

Data analysis is the process of examining, cleaning, transforming, and modeling data in order to make discoveries and, in many cases, support decision making. One key part of the data analysis process is separating the signal (meaningful information you are trying to discover) from the noise (random, meaningless variation) in the data.

The form and methods of data analysis can vary widely, and some form of data analysis is present in nearly every academic field. Here are some examples of data analysis projects:

  • Taylor Arnold, Lauren Tilton, and Annie Berke in “Visual Style in Two Network Era Sitcoms” (2019) used large-scale facial recognition and image analysis to examine the centrality of characters in the 1960s sitcoms Bewitched and I Dream of Jeannie. They found that Samantha is the distinctive lead character of Bewitched, while Jeannie is positioned under the domination of Tony in I Dream of Jeannie.
  • Allen Kim, Charuta Pethe, Steven Skiena in “What time is it? Temporal Analysis of Novels(2020) used the full text of 52,183 fiction books from Project Gutenberg and the HaithiTrust to examine the time of day that events in the book took place during. They found that events from 11pm to 1am became more common after 1880, which the authors attribute to the invention of electric lighting.
  • Wouter Haverals and Lindsey Geybels in “A digital inquiry into the age of the implied readership of the Harry Potter series” (2021) used various statistical methods to examine whether the Harry Potter books did in fact progressively become more mature and adult with successive books, as often believed by literature scholars and reviewers. While they did find that the text of the books implied a more advanced reader with later books, the change was perhaps not as large as would be expected.

How can Scholarly Commons help?

If all of this is new to you, don’t worry! The Scholarly Commons can help you get started.

Here are various aspects of our data services in the Scholarly Commons:

As always, if you’re interested in learning more about data analysis and how to support your own projects you can fill out a consultation request form, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

February Push!

Hello, researchers!

Congratulations! You made it through your first month back of the spring semester. From class work, to pouring rain, to enough snow and ice and make the university look like it’s auditioning for a role as Antarctica, you’re pushing forward!

A dual-monitor computer in the Scholarly Commons. The background of the image shows the Scholarly Commons space, which is filled with out dual-monitor computers and various desks.

Take a minute to look over all the awesome resources we have, right here in the Scholarly Commons, to help you keep chugging along with your research.

We are open 8:30 a.m. to 6 p.m., Monday through Friday. Our various, dual monitor computers have software ranging from Adobe Photoshop to OCR which can be paired with our various scanners to make machine readable PDFs!

The Scholarly Commons space. A desk with a computer and a sign reading "Scholarly Commons" is shown.

Researchers can book free consultations thanks to our partnerships with CITL Data Analytics and Technology Services! In these meetings, you can learn about R, SAS, and everything else you need to just get started or to get past that tricky problem in your statistical research.

Beyond that, users can make appoints with our GIS specialist, and learn even more through our GIS resources. We have a ton of great books in our non-circulating reference collection that can help you learn about Python, GIS, and more!

The Scholarly Common reference collection. Six shelves filled with books.


And that’s not all: our Data Analytics & Visualization Librarian has put together a plethora of resources to help turn your data into art. Check out the four most common types of charts guide to get started!

The Scholarly Commons space. it contains several workstations with a carpeted floor.

And even this doesn’t cover all of our services!

If you need assistance finding numeric data, understanding your copyrights, cleaning up data in OpenRefine, or even starting up a project using text mining, we have the resources you need.

The Scholarly Commons has all the resources you need to succeed, so stop by anytime! We’re always happy to help.

Scholarly Smackdown: StoryMap JS vs. Story Maps

In today’s very spatial Scholarly Smackdown post we are covering two popular mapping visualization products, Story Maps and StoryMap JS.Yes they both have “story” and “map” in the name and they both let you create interactive multimedia maps without needing a server. However, they are different products!

StoryMap JS

StoryMap JS, from the Knight Lab at Northwestern, is a simple tool for creating interactive maps and timelines for journalists and historians with limited technical experience.

One  example of a project on StoryMap JS is “Hockey, hip-hop, and other Green Line highlights” by Andy Sturdevant for the Minneapolis Post, which connects the stops of the Green Line train to historical and cultural sites of St. Paul and Minneapolis Minnesota.

StoryMap JS uses Google products and map software from OpenStreetMap.

Using the StoryMap JS editor, you create slides with uploaded or linked media within their template. You then search the map and select a location and the slide will connect with the selected point. You can embed your finished map into your website, but Google-based links can deteriorate over time! So save copies of all your files!

More advanced users will enjoy the Gigapixel mode which allows users to create exhibits around an uploaded image or a historic map.

Story Maps

Story maps is a custom map-based exhibit tool based on ArcGIS online.

My favorite example of a project on Story Maps is The Great New Zealand Road Trip by Andrew Douglas-Clifford, which makes me want to drop everything and go to New Zealand (and learn to drive). But honestly, I can spend all day looking at the different examples in the Story Maps Gallery.

Story Maps offers a greater number of ways to display stories than StoryMap JS, especially in the paid version. The paid version even includes a crowdsourced Story Map where you can incorporate content from respondents, such as their 2016 GIS Day Events map.

With a free non-commercial public ArcGIS Online account you can create a variety of types of maps. Although it does not appear there is to overlay a historical map, there is a comparison tool which could be used to show changes over time. In the free edition of this software you have to use images hosted elsewhere, such as in Google Photos. Story Maps are created through their wizard where you add links to photos/videos, followed by information about these objects, and then search and add the location. It is very easy to use and almost as easy as StoryMap JS. However, since this is a proprietary software there are limits to what you can do with the free account and perhaps worries about pricing and accessing materials at a later date.

Overall, can’t really say there’s a clear winner. If you need to tell a story with a map, both software do a fine job, StoryMap JS is in my totally unscientific opinion slightly easier to use, but we have workshops for Story Maps here at Scholarly Commons!  Either way you will be fine even with limited technical or map making experience.

If you are interested in learning more about data visualization, ArcGIS Story Maps, or geopatial data in general, check out these upcoming workshops here at Scholarly Commons, or contact our GIS expert, James Whitacre!

Book Review: Statistics Done Wrong by Alex Reinhart

One book you can read (but not check out, sorry!) at Scholarly Commons is Statistics Done Wrong: The Woefully Complete Guide by Alex Reinhart, an expansion of the popular website.

Reinhart studied physics as an undergraduate but did a masters  in statistics after realizing problems that misunderstandings of statistics were causing in physics and science as a whole. He is now working on a PhD in statistics at Carnegie Mellon.

I don’t know if this would be the best book for someone who has no background knowledge whatsoever. While the author does a good job at explaining a lot of the concepts, his target audience is people who’ve encountered bad statistics in advanced level research, such as medical studies. This book is a good start for those who want insight into the mind of a statistician, even if their math skills aren’t quite there. Although the book isn’t numbers heavy, I still definitely got lost a few times and had to re-read some of the passages. However, I really like the writing style of the author. All I want is to be able to write and articulate difficult concepts in a way as clear, concise, and even funny as the author.

One of the main takeaways of the book was:

“Scientists may be superhumanly caffeinated, but they’re still human, and the constant pressure to publish means that thorough documentation and replication are ignored”(Reinhart, 2015).

I remember learning about the difficulty that psychologists have faced replicating their results (and glad to hear that they as a discipline are improving their efforts to make sure studies and results can be replicated.) and this book explained a lot of the factors at play. For example, Reinhart discussed statistical power, and how not all journals checked if researchers had enough data to determine if their results were statistically significant in the first place. And the book goes in depth into the various issues that can make finding statistical significance a poor measure of whether a phenomena is occurring or not. Furthermore, while I found the debate about publication bias of studies about publication bias as well as “False-Positive Psychology” by Joseph P. Simmons  (doi:10.1177/0956797611417632) hilarious, I too am now worried about the state of research.

One suggestion from Statistics Done Wrong  is more options for making scientific data more accessible so that people don’t try the same failed methods again and again and can learn from others’ experiences. In other words, ways to share the raw data and software code used, even in studies that did not get published in a journal, before the format they are in becomes obsolete. Even though that’s often a major pain to do. Research Data Service are the best people to talk to for learning more about the efforts on campus for solving this problem. They are  your best bet for learning about ways to store and make scientific data more available.

Another problem mentioned in the book is the overall lack of statistical knowledge and education. Here at Illinois, Scholarly Commons is just one of many resources available. For more specific technical questions, we recommend asking a statistician through consulting services through the statistics department, however, this service costs money unless you get the STAT 427 students during spring semester. There are also some free resources and workshops through ATLAS-CITL and online tutorials through Lynda.

Overall, Statistics Done Wrong is an interesting read and a good starting point for those interested in having a better understanding of what to look out for when using statistics in research and ways to improve the way research is done on a whole.

ICPSR 2014 Summer Program in Quantitative Methods of Social Research

Still making a list of summer plans? As you gear up for summer, keep in mind that the Institute for Social Research at the University of Michigan is offering a wide range of classes on quantitative data-analysis. Whether you are a beginner or you are ready to study more advanced techniques, the program has something unique to offer each individual. Course instruction is centered around interactive, participatory data-analysis within a broader context of substantive social research.

Courses for the summer 2014 program are offered in two four-week sessions, May through August. These sessions include lecture, seminar, and workshop formats with participants from a diverse range of departments, universities, and organizations.

The following are a few examples of courses that will be offered:

Basic Foundation
Introduction to Statistics and Data Analysis
Introduction to Regression
Introduction to Computing

Linear Models and Beyond
Regression Analysis
Hierarchical Linear and Multilevel Models
Categorical Data Analysis

Substantive Topics
Race and Ethnicity
Curating Data & Providing Data Services
Designing, Conducting, and Analyzing Field Experiments

Advanced Techniques
Applied Bayesian Modeling
Advanced Time Series
The R Statistical Computing Environment

Multivariate Techniques
Multivariate Statistical methods
Scaling and Dimensional Analysis
Intro & Advanced Network Analysis

Formal Modeling
Game Theory
Rational Choice
Empirical Modeling for Theory Evaluation

Registration is now open. There are also a few free workshops that will be offered over the summer, but registration for those sessions ends May 15, 2014 and seats are limited!

For a full list of courses, fee and discount information, and to fill out an application visit the website.

Call: (734) 763-7400