For the transcript, click on “Continue reading” below.
As you do research with larger amounts of data, it becomes necessary to graduate from doing your data analysis in Excel and find a more powerful software. It can seem like a really daunting task, especially if you have never attempted to analyze big data before. There are a number of data analysis software systems out there, but it is not always clear which one will work best for your research. The nature of your research data, your technological expertise, and your own personal preferences are all going to play a role in which software will work best for you. In this post I will explain the pros and cons of Stata, R, and SPSS with regards to quantitative data analysis and provide links to additional resources. Every data analysis software I talk about in this post is available for University of Illinois students, faculty, and staff through the Scholarly Commons computers and you can schedule a consultation with CITL if you have specific questions.
Rock your research with the right tools!
Among researchers, Stata is often credited as the most user-friendly data analysis software. Stata is popular in the social sciences, particularly economics and political science. It is a complete, integrated statistical software package, meaning it can accomplish pretty much any statistical task you need it to, including visualizations. It has both a point-and-click user interface and a command line function with easy-to-learn command syntax. Furthermore, it has a system for version-control in place, so you can save syntax from certain jobs into a “do-file” to refer to later. Stata is not free to have on your personal computer. Unlike an open-source program, you cannot program your own functions into Stata, so you are limited to the functions it already supports. Finally, its functions are limited to numeric or categorical data, it cannot analyze spatial data and certain other types.
Pros |
Cons |
---|---|
User friendly and easy to learn | An individual license can cost between $125 and $425 annually |
Version control | Limited to certain types of data |
Many free online resources for learning | You cannot program new functions into Stata |
R and its graphical user interface companion R Studio are incredibly popular software for a number of reasons. The first and probably most important is that it is a free open-source software that is compatible with any operating system. As such, there is a strong and loyal community of users who share their work and advice online. It has the same features as Stata such as a point-and-click user interface, a command line, savable files, and strong data analysis and visualization capabilities. It also has some capabilities Stata does not because users with more technical expertise can program new functions with R to use it for different types of data and projects. The problem a lot of people run into with R is that it is not easy to learn. The programming language it operates on is not intuitive and it is prone to errors. Despite this steep learning curve, there is an abundance of free online resources for learning R.
Pros |
Cons |
---|---|
Free open-source software | Steep learning curve |
Strong online user community | Can be slow |
Programmable with more functions for data analysis |
SPSS is an IBM product that is used for quantitative data analysis. It does not have a command line feature but rather has a user interface that is entirely point-and-click and somewhat resembles Microsoft Excel. Although it looks a lot like Excel, it can handle larger data sets faster and with more ease. One of the main complaints about SPSS is that it is prohibitively expensive to use, with individual packages ranging from $1,290 to $8,540 a year. To make up for how expensive it is, it is incredibly easy to learn. As a non-technical person I learned how to use it in under an hour by following an online tutorial from the University of Illinois Library. However, my take on this software is that unless you really need a more powerful tool just stick to Excel. They are too similar to justify seeking out this specialized software.
Pros |
Cons |
---|---|
Quick and easy to learn | By far the most expensive |
Can handle large amounts of data | Limited functionality |
Great user interface | Very similar to Excel |
Additional Resources:
Thanks for reading! Let us know in the comments if you have any thoughts or questions about any of these data analysis software programs. We love hearing from our readers!
This week, geographers around the globe took some time to celebrate the software that allows them to analyze, well, that very same globe. November 13th marked the 20th annual GIS Day, an “international celebration of geographic information systems,” as the official GIS Day website puts it.
But while GIS technology has revolutionized the way we analyze and visualize maps over the past two decades, the high cost of ArcGIS products, long recognized as the gold standard for cartographic analysis tools, is enough to deter many people from using it. At the University of Illinois and other colleges and universities, access to ArcGIS can be taken for granted, but many of us will not remain in the academic world forever. Luckily, there’s a high-quality alternative to ArcGIS for those who want the benefits of mapping software without the pricetag!
QGIS is a free, open source mapping software that has most of the same functionality as ArcGIS. While some more advanced features included in ArcGIS do not have analogues in QGIS, developers are continually updating the software and new features are always being added. As it stands now, though, QGIS includes everything that the casual GIS practitioner could want, along with almost everything more advanced users need.
As is often the case with open source software alternatives, QGIS has a large, vibrant community of supporters, and its developers have put together tons of documentation on how to use the program, such as this user guide. Generally speaking, if you have any experience with ArcGIS it’s very easy to learn QGIS—for a picture of the learning curve, think somewhere along the lines of switching from Microsoft Word to Google Docs. And if you don’t have experience, the community is there to help! There are many guides to getting started, including the one listed in the above link, and more forum posts of users working through questions together than anyone could read in a lifetime.
For more help, stop by to take a look at one of the QGIS guidebooks in our reference collection, or send us an email at sc@library.illinois.edu!
Have you made an interesting map in QGIS? Send us pictures of your creations on Twitter @ScholCommons!
Daylight Saving Time Gripe Assistant Tool
Clocks fell back this weekend, which means the internet returns once again to the debate of whether or not we still need Daylight Saving Time. Andy Woodruff, a cartographer for Axis Maps, created a handy tool for determining how much you can complain about the time change. You input your ideal sunset and sunrise times, select whether the sunset or sunrise time you chose is more important, and the tool generates a map that shows whether DST should be gotten rid of, used year-round, or if no changes need to be made based on where you live. The difference a half hour makes is surprising for some of the maps, making this a fun data viz to play around with and examine your own gripes with DST.
This shows an ideal sunrise of 7:00 am and an ideal sunset of 6:00 pm.
Conveying tone through text can be stressful—finding the right balance of friendly and assertive in a text is a delicate operation that involves word choice and punctuation equally. Often, we make our text more friendly through exclamations points! Or by adding a quick laugh, haha. The Pudding took note of how varied our use of text-based laughs can be and put together a visual essay on how often we use different laughs and whether all of them actually mean we are “laughing out loud.” The most common laugh on Reddit is “lol,” while “hehe,” “jaja,” and “i’m laughing” are much less popular expressions of mirth.
“ha” is the expression most likely to be used to indicate fake laughter or hostility
how to do it in Excel: a shaded range
Here’s a quick tip for making more complex graphs using Excel! Storytelling with Data’s Elizabeth Ricks put together a great how-to article on making Excel show a shaded range on a graph. This method involves some “brute force” to make Excel’s functions work in your favor, but results in a clean chart that shows a shaded range rather than a cluster of multiple lines.
Pixelation to represent endangered species counts
On Imgur, user JJSmooth44 created a photo series to demonstrate the current status of endangered species using pixilation. The number of squares represent the approximate number of that species that remains in the world. The more pixelated the image, the fewer there are left.
The African Wild Dog is one of the images in which the animal is still mostly recognizable.
If you’re anything like us here in the Scholarly Commons, the day you’ve been waiting for is finally here. It’s time to put on a costume, eat too much candy, and celebrate all things spooky. That’s right, folks. It’s Halloween and we couldn’t be happier!
If you’ve been keeping up with our Twitter (@ScholCommons) this month, you’ve noticed we’ve been sharing some ghoulish graphs and other scary scholarship. To keep the holiday spirit(s) high, I wanted to use this week’s blog post to gather up all our favorites.
First up, check out the most haunted cities in the US on The Next Web, which includes some graphs but also a heat map of the most haunted areas in the country. Which region do you think has the most ghosts?
Halloween is just over two weeks away! Get in the holiday spirit with this #dataviz of the most haunted places in the U.S.!https://t.co/AGxKOUlOZq
— Scholarly Commons (@ScholCommons) October 15, 2019
If you’re more interested in what’s happening on across the pond, we’ve got you covered. Click on this project to see just how scary ArcGIS story maps can be.
https://twitter.com/ScholCommons/status/1187058855282462721
And while ghosts may be cool, we all know the best Halloween characters are all witches. Check out this fascinating project from The University of Edinburgh that explores real, historic witch hunts in Scotland.
Looking to brush up on some haunted history this week? Check out this fantastic map of the witchy history of Scotland, courtesy of the University of Edinburgh!https://t.co/GnxRr8YL2F
— Scholarly Commons (@ScholCommons) October 28, 2019
The next project we want to show you might be one of the scariest. I was absolutely horrified to find out that Illinois’ most popular Halloween candy is Jolly Ranchers. If you’re expecting trick-or-treaters tonight, please think of the children and reconsider your candy offerings.
Is your favorite candy on the map?https://t.co/Lqc6ggwqyo
— Scholarly Commons (@ScholCommons) October 28, 2019
Now that we’ve share the most macabre maps around, let’s shift our focus to the future. Nathan Yau uses data to predict when your death will occur. And if this isn’t enough to terrify you, try his tool to predict how you’ll die.
There's nothing spookier than actuarial science… Get in the Halloween spirit by taking a look into this #dataviz crystal ball and seeing how many years YOU have left to live!https://t.co/dkUTW9Aq1F
— Scholarly Commons (@ScholCommons) October 29, 2019
Finally, if you’re looking for some cooking help from an AI or a Great Old One, check out this neural network dubbed “Cooking with Cthulhu.”
Need some recipe ideas for your Halloween party? This AI project is here to help (well… sort of.)https://t.co/0YSpejCsfF
— Scholarly Commons (@ScholCommons) October 31, 2019
Do you have any favorite Halloween-themed research projects? If so, please share it with us here or on Twitter. And if you’re interested in doing your own deadly digital scholarship, feel free to reach out to the Scholarly Commons to learn how to get started or get help on your current work. Remember, in the words everyone’s favorite two-faced mayor…
We at the University of Illinois are lucky to have a library that offers access to more journals and databases than any one person could ever hope to make their way though. The downside of this much access, however, is that it can be easy for resources to get lost in the weeds. For the typical student, once you are familiar with a few databases or methods of searching for information, you tend to not seek out more unless you absolutely need to.
This week, we wanted to fight back against that tendency just a little bit, by introducing you to a database which many readers may not have heard of before but contains a veritable treasure trove of useful geographical information, the Big 10 Academic Alliance Geoportal.
This resource is a compilation of geospatial content from the 12 universities that make up the BTAA. Types of content available include maps (many of which are historic), aerial imagery, and geospatial data. Researchers with a specific need for one of those can easily navigate from the Geoportal homepage to a more specific resource page by selecting the type of information they are looking for here:
Alternatively, if you don’t particularly care about the type of data you find but rather are looking for data in a particular region, you can use the map on the left side of the display to easily zoom in to a particular part of the world and see what maps and other resources are available.
The numbers on the map represent the number of maps or other data in the Geoportal localized in each rough region of the world, for example, there are 310 maps for Europe, and 14 maps for the Atlantic Ocean. As you zoom in on the map, your options get more specific, and the numbers break down to smaller geographic regions:
When the map is zoomed in close enough that there is only one piece of data for a particular area, the circled numbers are replaced with a blue location icon, such as the ones displayed over Iceland, Sweden, and the Russia-Finland border above. Clicking on one of these icons will take you to a page with the specific image or data source represented on the map. For example, the icon over Iceland takes us to the following page:
Information is provided about what type of resource you’re looking at, who created it, what time period it is from, as well as which BTAA member institution uploaded the map (in this case, the University of Minnesota).
Other tools on the home page, including a search bar and lists of places and subjects represented in the Geoportal, mean that no matter what point you’re starting from you should have no problem finding the data you need!
The Geoportal also maintains a blog with news, featured items and more, so be sure to check it out and keep up-to-date on all things geospatial!
Do you have questions about using the Geoportal, or finding other geospatial data? Stop by the Scholarly Commons or shoot us an email at sc@library.illinois.edu, we’ll be happy to help you!
In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.
Which milk has the smallest impact on the planet?
Climate change impacts each of us in direct and indirect ways. Mitigating your personal carbon footprint is an important way to address climate change for many people, but often people are unsure how to make choices that benefit the climate. Daniela Haake from Datawrapper took a close look at how her choice to have milk in her coffee was damaging or benefiting the planet and it turns out things aren’t looking great for café con leche. The chart Haake published, created using data from Dr. Joseph Poore, compares the carbon emissions, land use, and water use of milk and the top 4 milk alternatives.
Here’s Who Owns the Most Land in America
“The 100 largest owners of private property in the U.S., newcomers and old-timers together, have 40 million acres, or approximately 2% of the country’s land mass,” Bloomberg News reports. The people who own this land are the richest people in the country, and their wealth has grown significantly over the last 10 years. Bloomberg created a map that demonstrates where the land these people own is located. Compared to the rest of the country, the amount of land owned by these people looks relatively small—could Bloomberg have presented more information about why it is significant that these people own land in these areas? And about why they own so much land?
How to Get Better at Embracing Unknowns
Representing our uncertainty in data can be difficult to do clearly and well. In Scientific American this month, Jessica Hullman analyzed different methods of representing uncertainty for their clarity and effectiveness. While there may be no perfect way to represent uncertainty in your data viz, Hullman argues that “the least effective way to present uncertainty is to not show it at all.” Take a look at Hullman’s different ways to represent uncertainty and see if any might work for your next project!
I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.
Hello students, faculty, and everyone else who makes up the amazing community of the University of Illinois at Urbana-Champaign! We hope the beginning of this new academic year has been an exciting and only-mildly-hectic time. The Scholarly Commons, your central hub for qualitative and quantitative research assistance, has officially resumed our extended hours.
That’s right, for the entirety of this beautiful fall semester we will be open Monday-Friday, 8:30am-6:00pm!
In addition to our expansive software and numerous scanners, the Scholarly Commons is here to provide you with access to both brand new and continued services.
New additions to the Scholarly Commons this semester include two, new, high-powered computers featuring: 6-core processors, NVidia 1080 video cards, 32GB RAM, and solid-state drives.
For the first time, we’ll also be offering REDCap (Research Electronic Data Capture) consultations to help you with data collection and database needs. Drop-in hours are available during this fall on Tuesdays, 9:00-11:00am in the Scholarly Commons.
CITL Statistical Consulting is back to help you with all your research involving R, Stata, SPSS, SAS, and more. Consultations can be requested through this form.
Drop-in hours are available with CITL Consultants:
Monday: 10:00am-4:00pm
Tuesday: 10:00am-4:00pm
Wednesday: 10:00am-1:00pm, 2:00-5:00pm
Thursday: 10:00am-4:00pm
Friday: 10:00am-4:00pm
Once again our wonderful Data Analytics and Visualization Librarian, Megan Ozeran, is offering office hours every other Monday, 2:00-4:00pm (next Office Hours will be held 9/9). Feel free to stop by with your questions about data visualization!
And speaking of data visualization, the Scholarly Commons will be hosting the Data Viz Competition this fall. Undergraduate and graduate student submissions will be judged separately, and there will be first and second place awards for each. All awards will be announced at the finale event on Tuesday, October 22nd. Check out last year’s entries.
Do you like to transform data into knowledge? Do you make graphs, infographics, or interactive dashboards? Show off your work! Enter your best data visualization for a chance to win $400. Visit https://t.co/lTQNxDyquE for more information and to submit. pic.twitter.com/Edvemdjp4c
— Scholarly Commons (@ScholCommons) August 26, 2019
As always, please reach out to the Scholarly Commons with any questions at sc@library.illinois.edu and best of luck in all your research this upcoming year!
In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.
A day in the life of Americans: a data comic
By illustrating the activity most Americans are doing at a given hour, Hong highlights what the average day looks like for an American worker.
Happy May, researchers! With the semester winding down and summer plans on the horizon, a lot of us are reflecting on what we’ve done in the past year. Sometimes it can be hard to determine what your daily routine looks like when you are the one doing it every day. Matt Hong created a cute and informative data comic about how we spend our time during the day, based on data from the Census Bureau. Check out Hong’s Medium page for more data comics.
What Qualifies as Middle-Income in Each State
The distribution of middle-income for households made up of four people.
Nathan Yau at Flowing Data created an interesting chart that shows the range of income that is considered “middle-income” in each state and the District of Columbia in the United States. The design of the chart itself is smooth and watching the transitions between income ranges based on number of people in the household is very enjoyable. It is also enlightening to see where states fall on the spectrum of what “middle-income” means, and this visualization could be a useful tool for researchers working on wage disparity.
The bottom of the chart shows jobs that people transition into later in life.
The end of the semester also means a wave of new graduates entering the workforce. While we extend our congratulations to those people, we often inquire about what their upcoming plans are and where they will be working in the future. For some, that question is straightforward; for others, a change of pace may be on the horizon. Nathan Yau of Flowing Data also created a frequency trail chart that shows at what age many people change career paths. As Yau demonstrates in a bar chart that accompanies the frequency trail chart, the majority of job switches happen early and late in life, a phenomenon which he offers some suggestions for.
The peak at the “older” end of the chart indicates some changes post-retirement, but also makes you wonder why people are still finding new jobs at age 85 to 89.
I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.
In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.
One Way to Spot a Partisan Gerrymander
Even though it feels like it was 2016 yesterday, we are more than a quarter of the way through 2019 and the 2020 political cycle is starting to heat up. A common issue in the minds of voters and politicians is fraudulent and rigged elections—voters increasingly wonder if their votes really matter in the current political landscape. Last week, the Supreme Court heard two cases on partisan gerrymandering in North Carolina and Maryland. FiveThirtyEight made an elegant visualization about gerrymandering in North Carolina. The visualization demonstrates how actual election outcomes can be used to extrapolate what percentage of seats will go to each party.
As you scroll, the chart continues to develop and become more complicated. It adds results from past elections to contextualize the severity of the current problems with gerrymandering. It also provides an example of the outcomes of a redrawn district map in Pennsylvania.
The change from line chart to plotted points better demonstrates the trend of the attitudes toward Brexit.
Sarah Leo from The Economist re-creates past visualizations from the publication that were misleading or poorly designed. The blog post calls out the mistakes made very effectively and offers redesigns, when possible. They also make their data available after each visualization.
Seeing two visualizations of the same data next to one another really helps drive home how data can be represented differently–and how that causes different impacts upon a reader.
The Financial Times has made an online version of their quick chart-making tool available for the public. Appropriately titled FastCharts, the site lets you upload your own data or play around with sample data they have provided. Because this tool is so simple, it seems like it would be useful for exploratory data, but maybe not for creating more complex explanations of your data.
Play with the provided example data or use your own data to produce an interesting result! For a challenge, see if any of the data in our Numeric Data Library Guide can work for this tool.
I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.