As a part of my internship at the Scholarly Commons, I’m going to do a series of posts describing the tools and methodologies that I’ve used in order to work on my dissertation project. This write-up serves as an introduction to my project, it’s larger goals, and tools that I use to start working with my data.
The Dissertation Project
In general, my dissertation draws on computational methodologies to account for the digital circulation and fragmentation of political movement texts in new media environments. In particular, I will examine the rhetorical responses on Twitter to three terrorist attacks in the U.S.: the 2013 Boston Marathon Bombing, the 2015 San Bernardino Shooting, and the 2016 Orlando Nightclub shooting. I begin with the idea that terrorism is a kind of message directed at an audience, and I am interested in how digital audiences in the U.S. come to understand, make meaning of, and navigate uncertainty following a terrorist attack. I am interested in the patterns of narratives, community construction, and expressions of affect that characterize terrorism as a social media phenomenon.
I am interested in the following questions: What methods might rhetorical scholars use to better understand the vast numbers of texts, posts, and “tweets” that make up our social media? How do digital audiences construct meanings in light of terrorist attacks? How does the interwoven agency and materiality of digital spaces influence forms of rhetorical action, such as invention and style? In order to better address such challenges, I turn to the tools and techniques of the Digital Humanities as a computational modes of analysis to examine the digitally circulated rhetoric surrounding terror events. Investigation of this rhetoric using topic models will help scholars to understand not only particular aspects of terrorism as a social media phenomenon, but also to better see the ways that community and identity are themselves formed amid digitally circulated texts.
At the beginning of this project, I had no experience working with textual data, so the following posts represent a cleaned and edited version of the learning process I went through. There was a lot of mess and exploration involved, but that meant I’ve come to understand a lot more.
Gathering The Tools
I use a Mac, so accessing the command line is as simple as firing up the Terminal.App. Windows users have to do a bit more work in order to get all these tools, but plenty of tutorials can be found with a quick search.
The first big choice was to learn how to code in R or Python. I’d heard that Python was better for text and R was better for statistical work, but it seems that it mostly comes down to personal preference as you can find people doing both in either language. Both R and Python have a bit of a learning curve, but a quick search for topic modeling in Python gave me a ton of useful results, so I chose to start there.
Anaconda is a package management system for the Python languages. What’s great about Anaconda is not only that it has a robust management system (so I can easily download the tools and libraries that I need without having to worry about dependencies or other errors), but also that it encourages the creation of “environments” for you to work in. This means that I can make mistakes or install and uninstall packages without having to worry about messing up my overall system or my other environments.
Instructions for downloading Anaconda can be found here, and I found this cheat-sheet very useful in setting up my initial environments. Python has a ton of documentation, so these pages are useful, and there are plenty of tutorials online. Each environment comes with a few default packages, and I quickly added some toolkits for processing text and plotting graphs.
Lots of people working with Python have the same problems or issues that I did. Whenever my code encountered an error, or when I didn’t know how to do something like write to a .txt file, searching StackOverflow usually got me on the right track. Most answers link to the Python documentation that relates to the question, so not only did I fix what was wrong but I also learned why.
Sometimes scholars put their code on GitHub for sharing, advancing research, and confirming their findings. I found code on here that is for topic modeling in Python, as well as setting up repositories for my own work. Using GitHub is a useful version control system, so it also meant that I never “lost” old code and could track changes over time.
This is a site for scholars interested in learning how to use tools for Digital Humanities work. There are some great tutorials here on a range of topics, including how to set up and use Python. It’s approachable and does a good job of covering everything you need to know.
These tools, taken together, form the basis of my workspace for dealing with my data. Upcoming topics will cover Data Collection, Cleaning the Data, Topic Models, and Graphing the Results.