Lightning Review: How to Use SPSS

“A nice step-by-step explanation!”

“Easy, not too advanced!”

“A great start!”

           Real, live reviews of Brian C. Cronk’s How to Use SPSS: A Step-By-Step Guide to Analysis and Interpretation by some of our patrons! This book, the Tenth Edition of this nine-chapter text published by Taylor and Francis, is ripe with walkthroughs, images, and simple explanations that demystifies the process of learning this statistical software. Also containing six appendixes, our patrons sang its praises after a two-hour research session here in the Scholarly Commons!

           SPSS, described on IBM’s webpage as “the world’s leading statistical software used to solve business and research problems by means of ad-hoc analysis, hypothesis testing, geospatial analysis and predictive analytics. Organizations use IBM SPSS Statistics to understand data, analyze trends, forecast and plan to validate assumptions and drive accurate conclusions’ is one of many tools CITL Statistical Consulting uses on a day-to-day basis in assisting Scholarly Commons patrons. Schedule a consultation with them from 10 am to 2 pm, Monday through Thursday, for the rest of the summer!

           We’re thrilled to hear this 2018 title is a hit with the researcher’s we serve! Cronk’s book, and so many more works on software, digital publishing, data analysis, and so much more make up our reference collection – free to use by anyone and everyone in the Scholarly Commons!

Facebook Twitter Delicious Email

European Union Parliament Rejects Copyright Law

The controversial bill, the Directive on Copyright in the Digital Single Market, was protested around the world, with websites sending up an alarm over one portion of the proposed law, Article 13.

Article 13 would require users to gain permission of copyright holders, likely through licensing, to upload anything that was copyrighted onto the internet. If they did not have permission, the website would have to block the content. This might seem like a good thing, and was argued by Paul McCartney and 1,300 other musicians that is would protect people from having their work stolen and uploaded illegally. Critics have argued that this law would be so strict it would prevent anyone on sites like YouTube from playing cover songs – which is how the Beatles got their start.

People argued that the article would also stifle fan creations – like fanart and fanfiction – because the law applies to not only music, but all audio, video, and text uploaded onto the internet. Including memes.

While the idea of protecting copyright is noble, to have everything uploaded onto the internet by a human being is literally impossible. The BBC notes that 400 hours of content are uploaded onto YouTube every 60 seconds. Because of this, YouTube has an automatic system that flags and demonetizes videos that thought to be in violation of copyright. Things as innocuous as birds chirping in the background of videos have flagged copyright claims, so to have such a policy not only beefed up, but spread across the entire internet, it is argued, would be detrimental.

In voting this bill down, EU policy-makers have given themselves more time to review and rework these proposed laws, as another vote will happen in September.

Facebook Twitter Delicious Email

Exploring Data Visualization #5 – R edition

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

This month, I wanted to share some resources specifically for learning to visualize data using R.

1) R is a free, open source programming language that is heavily used for statistical analysis, but has also expanded to encompass nearly any kind of data analysis you would want to do. In the Scholarly Commons, we have R and RStudio (a user-friendly R development environment) installed on all of our lab computers. RStudio’s website provides links to a lot of ways for you to get started with R.

2) R guru Hadley Wickham gave a public lecture at the University of Notre Dame last August. (Note that his talk starts about 37 minutes into the video.) In the lecture, he walks through a simple example of the iterative process of data visualization in R, and gives additional related advice for doing data science. You can learn from his lecture without knowing any R, but you will find it easier to understand if you have basic experience with programming in general.

3) If you want a book to help you learn more in depth, Wickham and a colleague wrote R for data science: Import, tidy, transform, visualize, and model data. You can read R for data science online, or you can come in to the Scholarly Commons to read the physical book while practicing on one of our lab computers.

4) You can also find a number of specific R courses at Lynda.com, such as “Data Visualization in R with ggplot2.” Just make sure to log in with your U of I credentials so you can access the courses for free.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Facebook Twitter Delicious Email

Understanding Creative Commons Licenses

It doesn’t matter if you’re a student, a scholar, or just someone with a blog: we all run into issues finding images that you’re allowed to use on your website, in your research, or in an advertisement. While copyright laws have avenues for use, it’s not guaranteed that you can use the image you want, and the process of getting access to that image may be slow. That’s why looking at images with a Creative Commons license are a great alternative to traditional copyrighted images.

A Creative Commons license is a more flexible option than copyright and can be used on images, or basically any other kind of shareable work. When a creator chooses a Creative Commons license, people do not need to ask for their explicit permission to use their work. However, that doesn’t mean that the creator gives up control of the image; rather, they choose one of six current options for their Creative Commons license:

  • Attribution: The most lenient license. The attribution license lets others do what they please with your work, so long as they credit the original creator.
  • Attribution-ShareAlike: Similar to the attribution license, though all derivatives of the original work must be licensed under identical terms to that original.
  • Attribution-NoDerivs: This allows others to use the work as they please, so long as they do not change or manipulate it, and credit the creator.
  • Attribution-NonCommercial: This license allows people to use and tweak the work freely, except for commercial enterprises. The derivative works do not have to be licensed under identical terms.
  • Attribution-NonCommercial-ShareAlike: Same as above except derivative works must be licensed under identical terms.
  • Attribution-NonCommercial-NoDerivs: The most restrictive license. Others may download the work, but they cannot change them or use them commercially.

All-in-all, most Creative Commons works have “some rights reserved.” As a consumer, you have the responsibility to look up license of any Creative Commons work you hope to use (which isn’t very hard – most of the time any limitations are listed).

Here are some examples of images with differing Creative Commons licenses:

The only stipulation on this image is that I must provide proper attribution. “Albert Cavalier King Charles Spaniel” was taken by Glen Bowman on July 21, 2013 and is hosted on flickr.com.

This image of a Cavalier King Charles Spaniel only requires creator attribution. It can be used commercially so long as I acknowledge Glen Bowman, the photo’s creator. So if I so chose, I could hypothetically edit this photo to use as a welcome banner on my Cavalier King Charles Spaniel appreciation blog, include it in a PowerPoint I use for my veterinary school class, or copy it in an advertisement for my dog-walking business.

This Creative Commons licensed image requires proper attribution. “Cavalier King Charles Spaniel” was taken by James Watson (kingjimmy81) on August 17, 2013, and is hosted on Flickr.com.

This image of a Cavalier King Charles Spaniel has a more restrictive license than the above image. You can share the image in any medium or format, but you must give appropriate credit to James Watson, the creator. You cannot use it commercially, and you cannot distribute derivatives of the photo. So I could include this on my Cavalier King Charles appreciation blog with proper attribution, but could not edit it to make it into a banner on the homepage. And while using it in my veterinary school PowerPoint is still okay, I could not use it in an advertisement for my dog-walking business.

If you’re interested in finding Creative Commons works, you can use the Creative Commons Search function, which links up to various search engines, including Google, Google Images, Wikimedia Commons, and Flickr. If you’re interested in learning more about Creative Commons licenses, check out the Scholarly Commons’ Creative Commons basics page, as well as our use/creation of Creative Commons licenses page. If you’re interested in learning more about intellectual property in general, visit the Main Library’s Intellectual Property LibGuide, or get in touch with the library’s copyright specialist, Sara Benson (srbenson@illinois.edu).

Facebook Twitter Delicious Email

New Digital Humanities Books in the Scholarly Commons!

Is there anything quite as satisfying as a new book? We just got a new shipment of books here in the Scholarly Commons that complement all our services, including digital humanities. Our books are non-circulating, so you cannot check them out, but these DH books are always available for your perusal in our space.

Stack of books in the Scholarly Commons

Two brand new and two mostly new DH books

Digital Humanities: Knowledge and Critique in a Digital Age by David M. Berry and Anders Fagerjord

Two media studies scholars examine the history and future of digital humanities. DH is a relatively new field, and one that is still not clearly defined. Berry and Fagerjord take a deep dive into the methods that digital humanists gravitate towards, and critique their use in relation to the broader cultural context. They are more critical of the “digital” than the “humanities,” meaning they consider more how use of digital tools affects the society as a whole (there’s that media studies!) than how scholars use digital methods in humanities work. They caution against using digital tools just because they are “better,” and instead encourage the reader to examine their role in the DH field to contribute to its ongoing growth. Berry has previously edited Understanding Digital Humanities (eBook available through Illinois library), which discusses similar issues. For a theoretical understanding of digital humanities, and to examine the issues in the field, read Digital Humanities.

Text Mining with R: A Tidy Approach by Julia Silge and David Robinson

Working with data can be messy, and text even messier. It never behaves how you expect it to, so approaching text analysis in a “tidy” manner is crucial. In Text Mining with R, Silge and Robinson present their tidytext framework for R, and instruct the reader in applying this package to natural language processing (NLP). NLP can be applied to derive meaning from unstructured text by way of unsupervised machine learning (wherein you train the computer to organize or otherwise analyze your text and then you go get coffee while it does all the work). This book is most helpful for those with programming experience, but no knowledge of text mining or natural language processing is required. With practical examples and easy to follow, step-by-step guides, Text Mining with R serves as an excellent introduction to tidying text for use in sentiment analysis, topic modeling, and classification.

No programming or R experience? Try some of our other books, like R Cookbook for an in-depth introduction, or Text Analysis with R for Students of Literature for a step-by-step learning experience focused on humanities people.

Visit us in the Scholarly Commons, 306 Main Library, to read some of our new books. Summer hours are Monday through Friday, 10 AM-5 PM. Hope to see you soon!

Facebook Twitter Delicious Email

Summer II – Services are back!

Two men nodding and saying "We're back"

That’s right! It’s Summer II, and most of our services are re-opening for all you researchers out there.

A pineapple with sunglasses

CITL Statistical Consulting is back for the rest of the summer here in the Scholarly Commons from 10 am to 2 pm, Monday through Thursday! Schedule a consultation with them here
A chart written in a journal

The Survey Research Lab is here every Thursday from 2-5 until Summer II ends!

A picture of a journal and a chart

Our GIS specialist is in the office accepting consultations as well! Schedule an appointment with him by sending us an email

A globe on a beach

And don’t forget our Data Discovery team is here to help you find and format digital numeric and spatial data!

Image of numeric data

Summer is the perfect time to get ahead on your projects, so let us help!

Facebook Twitter Delicious Email

Using an Art Museum’s Open Data

*Edits on original idea and original piece by C. Berman by Billy Tringali

As a former art history student, I’m incredibly interested in the how the study of art history can be aided by the digital humanities. More and more museums have started allowing the public to access a portion of their data. When it comes to open data, museums seem to be lagging a bit behind other cultural heritage institutions, but many are providing great open data for humanists.

For art museums, the range of data provided ranges. Some museums are going the extra mile to give a lot of their metadata to the public. Others are picking and choosing aspects of their collection, such as the Museum of Modern Art’s Exhibition and Staff Histories.

Many museums, especially those that collect modern and contemporary art, can have their hands tied by copyright laws when it comes to the data they present. A few of the data sets currently available from art museums are the Cooper Hewitt’s Collection Data, the Minneapolis Institute of Arts metadata, the Rijksmuseum API, the Tate Collection metadata, and the Getty Vocabularies.

The Metropolitan Museum of Art has recently released all images of the museum’s public domain works under a Creative Commons Zero license.

More museum data can be found here!

Facebook Twitter Delicious Email

Exploring Data Visualization #4

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in May:

a cartoon image of a few buildings, above two cartoon characters, one who is pointing and saying "We missed people here," while the other character shrugs and says "We can't do anything about it"

from Alvin Chang at Vox, “How Republicans are undermining the 2020 census, explained with a cartoon”

1) Alvin Chang, Senior Graphics Reporter at Vox, “covers policy by making explainers with charts and cartoons.” This month he explained the precarious state of the upcoming 2020 U.S. Census.

a dual-axis line chart overlaid with a stick figure drawing of a confused person misreading the chart's data

from Lisa Charlotte Rost at Uncharted, “Why not to use two axes, and what to use instead”

2) Lisa Charlotte Rost, a designer for Datawrapper, explains why dual-axis charts are almost always terrible, and what you can use instead.

text saying "The Wisdom and/or Madness of Crowds," surrounded by a cartoon rendering of a network graph

“The Wisdom and/or Madness of Crowds,” a game created by Nicky Case

3) Play this cute game! Nicky Case combines the logic of network graphs with the science of crowds in an “explorable” that shows why some crowds generate wisdom, while others create madness.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Facebook Twitter Delicious Email

Exploring Data Visualization #3

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Welcome back to this blog series! Here are some of the things I read in April:

a photograph of a knit pattern in a very strange shape, using green yarn

“Make Caows and Shapcho” pattern knit by MeganAnn (https://www.ravelry.com/projects/MeganAnn/skyknit-the-collection)

1) Janelle Shane, who has created a new kind of humor based on neural networks, trained a neural network to generate knitting patterns. Experienced knitters then attempted these patterns so we can see what the computer generated, ranging from reasonable to silly to downright creepy creations.

map showing that many areas of the United States get their first leaf earlier than in the past

from NASA Earth Observatory, “Spring is Arriving Earlier in National Parks”

2) Considering we had snowfall in April, you might not think spring began early this year (I know I don’t!). But broadly speaking, climate change has caused spring to begin earlier and earlier across the United States. The NASA Earth Observatory looked at data published in 2016 to create maps that visualize how climate change has changed the timing of spring.

3) If you want to learn a new tool but aren’t sure what to choose, have a look at Nathan Yau’s suggestions in his post What I Use to Visualize Data. He even divides his list into categories based on where he is in the process, such as initial data processing versus final visualizations.

Facebook Twitter Delicious Email

Using Reddit’s API to Gather Text Data

The Reddit logo.

I initially started my research with an eye to using digital techniques to analyze an encyclopedia that collects a number of conspiracy theories in order to determine what constitute typical features of conspiracy theories. At this point, I realize there were two flaws in my original plan. First, as discussed in a previous blog post, the book I selected failed to provide the sort of evidence I required to establish typical features of conspiracy theories. Second, the length of the book, though sizable, was nowhere near large enough to provide a corpus that I could use a topic model on in order to derive interesting information.

My hope is that I can shift to online sources of text in order to solve both of these problems. Specifically, I will be collecting posts from Reddit. The first problem was that my original book merely stated the content of a number of conspiracy theories, without making any effort to convince the reader that they were true. As a result, there was little evidence of typical rhetorical and argumentative strategies that might characterize conspiracy theories. Reddit, on the other hand, will provide thousands of instances of people interacting in an effort to convince other Redditors of the truth or falsity of particular conspiracy theories. The sorts of strategies that were absent from the encyclopedia of conspiracy theories will, I hope, be present on Reddit.
The second problem was that the encyclopedia failed to provide a sufficient amount of text. Utilizing Reddit will certainly solve this problem; in less than twenty-four hours, there were over 1,300 comments on a recent post alone. If anything, the solution to this problem represents a whole new problem: how to deal with such a vast (and rapidly changing) body of information.

Before I worry too much about that, it is important that I be able to access the information in the first place. To do this, I’ll need to use Reddit’s API. API stands for Application Programming Interface, and it’s essentially a tool for letting a user interact with a system. In this case, the API allows a user to access information on the Reddit website. Of course, we can already do this with an web browser. The API, however, allows for more fine-grained control than a browser. When I navigate to a Reddit page with my web browser, my requests are interpreted in a very pre-scripted manner. This is convenient; when I’m browsing a website, I don’t want to have to specify what sort of information I want to see every time a new page loads. However, if I’m looking for very specific information, it can be useful to use an API to hone in on just the relevant parts of the website.

For my purposes, I’m primarily interested in downloading massive numbers of Reddit posts, with just their text body, along with certain identifiers (e.g., the name of the poster, timestamp, and the relation of that post to other posts). The first obstacle to accessing the information I need is learning how to request just that particular set of information. In order to do this, I’ll need to learn how to write a request in Reddit’s API format. Reddit provides some help with this, but I’ve found these other resources a bit more helpful. The second obstacle is that I will need to write a program that automates my requests, to save myself from having to perform tens of thousands of individual requests. I will be attempting to do this in Python. While doing this, I’ll have to be sure that I abide by Reddit’s regulations for using its API. For example, a limited number of requests per minute are allowed so that the website is not overloaded. There seems to be a dearth of example code on the Internet for text acquisition of this sort, so I’ll be posting a link to any functional code I write in future posts.

Facebook Twitter Delicious Email