Stata vs. R vs. SPSS for Data Analysis

As you do research with larger amounts of data, it becomes necessary to graduate from doing your data analysis in Excel and find a more powerful software. It can seem like a really daunting task, especially if you have never attempted to analyze big data before. There are a number of data analysis software systems out there, but it is not always clear which one will work best for your research. The nature of your research data, your technological expertise, and your own personal preferences are all going to play a role in which software will work best for you. In this post I will explain the pros and cons of Stata, R, and SPSS with regards to quantitative data analysis and provide links to additional resources. Every data analysis software I talk about in this post is available for University of Illinois students, faculty, and staff through the Scholarly Commons computers and you can schedule a consultation with CITL if you have specific questions.

Short video loop of a kid sitting at a computer and putting on sun glasses

Rock your research with the right tools!


STATA

Stata logo. Blue block lettering spelling out Stata.

Among researchers, Stata is often credited as the most user-friendly data analysis software. Stata is popular in the social sciences, particularly economics and political science. It is a complete, integrated statistical software package, meaning it can accomplish pretty much any statistical task you need it to, including visualizations. It has both a point-and-click user interface and a command line function with easy-to-learn command syntax. Furthermore, it has a system for version-control in place, so you can save syntax from certain jobs into a “do-file” to refer to later. Stata is not free to have on your personal computer. Unlike an open-source program, you cannot program your own functions into Stata, so you are limited to the functions it already supports. Finally, its functions are limited to numeric or categorical data, it cannot analyze spatial data and certain other types.

 

Pros

Cons

User friendly and easy to learn An individual license can cost
between $125 and $425 annually
Version control Limited to certain types of data
Many free online resources for learning You cannot program new
functions into Stata

Additional resources:


R 

R logo. Blue capital letter R wrapped with a gray oval.

R and its graphical user interface companion R Studio are incredibly popular software for a number of reasons. The first and probably most important is that it is a free open-source software that is compatible with any operating system. As such, there is a strong and loyal community of users who share their work and advice online. It has the same features as Stata such as a point-and-click user interface, a command line, savable files, and strong data analysis and visualization capabilities. It also has some capabilities Stata does not because users with more technical expertise can program new functions with R to use it for different types of data and projects. The problem a lot of people run into with R is that it is not easy to learn. The programming language it operates on is not intuitive and it is prone to errors. Despite this steep learning curve, there is an abundance of free online resources for learning R.

Pros

Cons

Free open-source software Steep learning curve
Strong online user community Can be slow
Programmable with more functions
for data analysis

Additional Resources:

  • Introduction to R Library Guide: Find valuable overviews and tutorials on this guide published by the University of Illinois Library.
  • Quick-R by DataCamp: This website offers tutorials and examples of syntax for a whole host of data analysis functions in R. Everything from installing the package to advanced data visualizations.
  • Learn R on Code Academy: A free self-paced online class for learning to use R for data science and beyond.
  • Nabble forum: A forum where individuals can ask specific questions about using R and get answers from the user community.

SPSS

SPSS logo. Red background with white block lettering spelling SPSS.

SPSS is an IBM product that is used for quantitative data analysis. It does not have a command line feature but rather has a user interface that is entirely point-and-click and somewhat resembles Microsoft Excel. Although it looks a lot like Excel, it can handle larger data sets faster and with more ease. One of the main complaints about SPSS is that it is prohibitively expensive to use, with individual packages ranging from $1,290 to $8,540 a year. To make up for how expensive it is, it is incredibly easy to learn. As a non-technical person I learned how to use it in under an hour by following an online tutorial from the University of Illinois Library. However, my take on this software is that unless you really need a more powerful tool just stick to Excel. They are too similar to justify seeking out this specialized software.

Pros

Cons

Quick and easy to learn By far the most expensive
Can handle large amounts of data Limited functionality
Great user interface Very similar to Excel

Additional Resources:

Gif of Kermit the frog dancing and flailing his arms with the words "Yay Statistics" in block letters above

Thanks for reading! Let us know in the comments if you have any thoughts or questions about any of these data analysis software programs. We love hearing from our readers!