This summer, CommonsKnowledge will be taking a different approach by exploring data tools, services, and concepts in depth. It’s true, “data” has become an inescapable buzzword subject to the optimistic flights of fancy of politicians, technologists and media personalities. “Big data,” “data mining,” and “data warehousing” have become the bywords of a technology obsessed consumer society, but even as the word “data” becomes semantically bloated to the point of confusion, the forces behind our media’s obsession with data are transforming culture, industry, and, significantly for us, the academy. Becoming data literate is increasingly important across disciplines and professions, and, as such, we want to spend the summer highlighting the data tools and services that we provide.

We’ve mentioned SPSS and other statistical packages, like SAS, STATA, PSPP, and R on this blog before, but only to say that they are available in the Scholarly Commons. These are all specialized tools for data processing, but you’ve probably already used data processing software in the form of a spreadsheet application, like Microsoft eXcel or Openoffice Calc. In this post, we’ll explore how one of the most common and venerable (it was first released in 1968) statistical software packages, SPSS, differs from these applications and why you would choose to use one over the other.

On its surface, SPSS looks a lot like a typical spreadsheet application. When you open it, you see the familiar tabular grid and can enter values in cells. Spreadsheets, on the other hand, are capable of a lot of things that SPSS is good at, like generating graphs and statistics on a data set. The difference can be summed up by saying that spreadsheets are designed to be very flexible and broadly applicable to many different tasks, while SPSS was designed specifically for statistical processing of large amounts of data at an enterprise level. For example, unlike a spreadsheet, SPSS has the concepts of “case” and “variable” built in. The rows in SPSS always represent cases, for example survey responses or experimental subjects, and the columns always represent variables observed from those cases, like the specific values given by the survey respondent or measurements from the experimental subject. Because of this case/variable arrangement, when a calculation is performed over a set of data, the result does not get inserted into another cell on the table, like it would in a typical spreadsheet, but appears in a separate window. This is particularly advantageous when dealing with large sets of data, since it keeps calculated statistics and graphs separate from the raw data but still easily accessible.

It is also much more convenient to perform statistical tests in SPSS, even though many are possible using typical spreadsheets. For example, to perform a one-sample T-test with Excel, you’ll have to calculate the T value independently for the sample and use the “T.DIST” function to return the significance, while also selecting a cell for the results and labeling it in another cell. To perform the same test in SPSS you select a variable and supply the value to compare with your sample and, when you click “Ok,” spss generates a table with t, the degrees of freedom, the significance, and a confidence interval neatly calculated.

Probably the most significant advantage to using SPSS is that it was designed with modern data collection methods in mind. A lot of data that’s collected, especially survey data, is numerically coded before it’s electronically stored. So for example a response of “strongly agree” might become a 6; a level of education such as “completed high school” or “some college” might become a 10 or 11. SPSS makes it possible to automatically define the variable so that it’s coded values are keyed to their original meanings. For this reason, many surveys and polls, (including many that U of I students and faculty can access through Roper iPoll, ICPSR, and other sets provided through the U of I library), make their raw data available in SPSS’s native “.sav” format.

For assistance and information on data or SPSS email datagis@library.illinois.edu or stop by the Scholarly Commons, Main Library 306, on weekdays from 1-5.