Lightning Review: Text Analysis with R for Students of Literature

Cover of Text Analysis with R book

My undergraduate degree is in Classical Humanities and French, and like many humanities and liberal arts students, computers were mostly used for accessing Oxford Reference Online and double checking that “bonjour” meant “hello” before term papers were turned in. Actual critical analysis of literature came from my mind and my research, and nothing else. Recently, scholars in the humanities began seeing the potential of computational methods for their study, and coined these methods “digital humanities.” Computational text analysis provides insights that in many cases, aren’t possible for a human mind to complete. When was the last time you read 100 books to count occurrences of a certain word, or looked at thousands of documents to group their contents by topic? In Text Analysis with R for Students of Literature, Matthew Jockers presents programming concepts specifically how they relate to literature study, with plenty of help to make the most technophobic English student a digital humanist.

Jockers’ book caters to the beginning coder. You download practice text from his website that is already formatted to use in the tutorials presented, and he doesn’t dwell too much on pounding programming concepts into your head. I came into this text having already taken a course on Python, where we did edit text and complete exercises similar to the ones in this book, but even a complete beginner would find Jockers’ explanations perfect for diving into computational text analysis. There are some advanced statistical concepts presented which may turn those less mathematically inclined, but these are mentioned only as furthering understanding of what R does in the background, and can be left to the computer scientists. Practice-based and easy to get through, Text Analysis with R for Students of Literature serves its primary purpose of bringing the possibilities of programming to those used to traditional literature research methods.

Ready to start using a computer to study literature? Visit the Scholarly Commons to view the physical book, or download the eBook through the Illinois library.

New Digital Humanities Books in the Scholarly Commons!

Is there anything quite as satisfying as a new book? We just got a new shipment of books here in the Scholarly Commons that complement all our services, including digital humanities. Our books are non-circulating, so you cannot check them out, but these DH books are always available for your perusal in our space.

Stack of books in the Scholarly Commons

Two brand new and two mostly new DH books

Digital Humanities: Knowledge and Critique in a Digital Age by David M. Berry and Anders Fagerjord

Two media studies scholars examine the history and future of digital humanities. DH is a relatively new field, and one that is still not clearly defined. Berry and Fagerjord take a deep dive into the methods that digital humanists gravitate towards, and critique their use in relation to the broader cultural context. They are more critical of the “digital” than the “humanities,” meaning they consider more how use of digital tools affects the society as a whole (there’s that media studies!) than how scholars use digital methods in humanities work. They caution against using digital tools just because they are “better,” and instead encourage the reader to examine their role in the DH field to contribute to its ongoing growth. Berry has previously edited Understanding Digital Humanities (eBook available through Illinois library), which discusses similar issues. For a theoretical understanding of digital humanities, and to examine the issues in the field, read Digital Humanities.

Text Mining with R: A Tidy Approach by Julia Silge and David Robinson

Working with data can be messy, and text even messier. It never behaves how you expect it to, so approaching text analysis in a “tidy” manner is crucial. In Text Mining with R, Silge and Robinson present their tidytext framework for R, and instruct the reader in applying this package to natural language processing (NLP). NLP can be applied to derive meaning from unstructured text by way of unsupervised machine learning (wherein you train the computer to organize or otherwise analyze your text and then you go get coffee while it does all the work). This book is most helpful for those with programming experience, but no knowledge of text mining or natural language processing is required. With practical examples and easy to follow, step-by-step guides, Text Mining with R serves as an excellent introduction to tidying text for use in sentiment analysis, topic modeling, and classification.

No programming or R experience? Try some of our other books, like R Cookbook for an in-depth introduction, or Text Analysis with R for Students of Literature for a step-by-step learning experience focused on humanities people.

Visit us in the Scholarly Commons, 306 Main Library, to read some of our new books. Summer hours are Monday through Friday, 10 AM-5 PM. Hope to see you soon!

An Obstacle and (Hopefully) a Solution in Digital Research

This post is part of an ongoing series about my research on conspiracy theories and the tools I use to pursue it. You can read Part I: What is a Conspiracy Theory and Part II: Why Are Conspiracy Theories So Compelling? on Commons Knowledge.


Part of my research project, in which I am attempting to give an empirically-informed account of what constitutes a conspiracy theory, involves reading through a text that compiles a few hundred different conspiracy theories and gives brief accounts of them. By reading through them and coding for the presence of various features, I hoped to get some information on what features were most typical of conspiracy theories. My own suspicion is that an important part of the appeal of conspiracy theories is that, in general, we tend to find appeals to coincidence unconvincing. For example, if a student is repeatedly absent from class on test days and gives as an excuse a series of illnesses, we are inclined to find this unconvincing. It seems very coincidental that their illnesses always occur on test days. Of course, it’s possible that it really is a coincidence, but we strongly discount the explanatory weight of such an appeal. If we cast about for another theory to explain their absences, we quickly happen across another one: The student didn’t prepare for the tests and so wanted to avoid coming to class. Again, it’s possible this theory is incorrect, but it is much more satisfying than the theory that the student just coincidentally gets sick on test days.

I suspect that conspiracy theories derive much of their appeal from the unsatisfactory character of appeals to coincidence. To pick just one example: The plane crash that killed Senator Paul Wellstone in 2002 has been the focal point of a number of conspiracy theories. The standard account is that the crash was due to pilot error. One way that suspicion has been raised about this account is by noting a number of seeming coincidences. One coincidence is that Wellstone was one of the most outspoken voices against the Bush administration at the time. It raised some conspiracy theorists’ eyebrows that such a prominent liberal voice “just happened” to die unexpectedly in a plane crash. Alternative theories propose that members of the Bush administration arranged for the assassination of Wellstone. Another purported coincidence involved accounts of electronic malfunction: cell phones and automatic garage door openers in the vicinity supposedly malfunctioned at roughly the same time the plane crashed. This is accounted for in some conspiracy theories by appealing to the use of electromagnetic frequency weapon which disabled the controls of the plane, while also causing malfunction in nearby electronic equipment. My hypothesis is that this style of explanation and theory development is typical of conspiracy theories in general.

Federal investigators sift through debris in this Oct. 27, 2002 file photo, from the twin engine plane that crashed two days earlier near Eveleth, Minn,. killing Sen. Paul Wellstone, his wife Sheila, daughter Marcia and several others. The National Transportation Safety Board is ready to vote on the likely cause of the 2002 accident. (AP Photo/Jim Mone, File)

In my study, I have noted whether each conspiracy theory in my chosen compilation points to an appeal to coincidence in the rival (usually “standard”) account and, if it does, whether it then appeals to a conspiracy in order to provide a “better” theory to replace the one that appeals to coincidence. Unfortunately, this strategy has hit an obstacle. While there is a strong correlation between pointing to an appeal to coincidence as a problem with a theory and substituting an appeal to conspiracy in its place, there were relatively few theories that appeared to do this, based on the text. Even in cases where I knew the criticism of appeals to coincidence frequently played a large role in the justification of particular conspiracy theories, I often found no evidence of this in the brief accounts of the theories given in the book. It could be, of course, that my hypothesis is just mistaken; given that it didn’t match up with other conclusions I’d drawn based on other sources, however, I am inclined to think the problem is the source text. Thinking it over, it was clear that the nature of the text was to present the accounts “objectively,” stating the content of the views, normally without any effort to convince the reader one way or the other. Occasionally, talk of coincidences finds its way into the entries. Even then, it is only rarely explicit: for example, there are zero appearances of the word ‘coincidence’ or the phrase ‘just happened’, only three appearances of ‘coincidental’, and all appearances of ‘happened to’ are, upon checking the context, not related to appeals to coincidence in explanation. No other typically “coincidental” language makes a significant appearance. My concern is that this reveals only that my chosen text doesn’t address whether the presented theories are explanatorily superior to its rivals or explore how they developed in the first place. Since those are the areas in which coincidence would play a larger role, I’ve concluded that my chosen text is misleading as a source of data about conspiracy theories (at least with regard to the role of coincidence; in other areas, such as whether the theory is an official or unofficial account, it is much more reliable).

In order to resolve this obstacle, I have settled on using primary sources that are more likely to involve attempts to persuade the reader that the contained theory is correct and superior to its rivals. This includes books and websites that present a particular conspiracy theory and online fora where proponents of various conspiracy theories argue and collaborate in the development of conspiracy theories. This is obviously vastly larger than the single anthology I initially intended to use. My focus currently is finding a way to carve out a manageable chunk of this gigantic data set, most likely from online message boards, like Reddit, and use text from these fora to find evidence for my hypothesis about appeals to coincidence. This will necessitate the use of at least two kinds of digital techniques: web scraping, in order to extract usable text from a large number of individual websites, and topic modeling, in order to find meaningful relationships within an otherwise unmanageably large corpus. In my next post, I will talk about my initial forays into these techniques.

Why Are Conspiracy Theories So Compelling?

In my last post, I described the first phase of my research, in which I am attempting to develop an empirically informed definition of ‘conspiracy theory’. In this post, I want to discuss the second focus of my research: why it is that conspiracy theories are so compelling for so many people.

Newspaper article with headline "Kennedy Slain by CIA, Mafia, Castro, LBJ, Teamsters, Freemasons"Although the specifics can be debated, it is clear that conspiracy theories are very popular. In a recent survey, 61% of participants claimed belief in some form of a conspiracy theory about the assassination of John F. Kennedy. This could possibly be attributed to increased publicity about the event due to its impending fiftieth anniversary and coverage of the release of some previously classified documents regarding it. But in an even more wide-ranging study four years ago, the number was 51%. At the very least, it looks plausible that more than half of Americans believe in this particular conspiracy theory, and there are plenty of other theories out there. For example, approximately 40% of respondents endorsed the conspiracy theory that the FDA is withholding a natural cancer cure.

Conspiracy theories are often treated dismissively as the ravings of deranged paranoiacs. Yet, we have good reason to believe that a majority of Americans believe in at least one conspiracy theory, and we can’t dismiss all of them in this way. Why, then, are conspiracy theories so compelling? There are a number of predictors for belief in conspiracy theories. The best is belief in other conspiracy theories: if someone believes one conspiracy theory, the likelihood that they believe another goes up. Other predictors are useful for predicting if a subject believes in a particular conspiracy theory, but not for the likelihood that they believe in conspiracy theories generally. Belief in conspiracy theories is common regardless of race, but white Americans are more likely than African-Americans to believe in Sandy Hook conspiracy theories (in which the government supposedly faked the Sandy Hook shooting in order to initiate more stringent gun control laws), while African-Americans are more likely than white Americans to believe that the CIA developed AIDS in order to kill African-American populations. Similarly, political liberals are more likely to endorse GMO conspiracy theories, while political conservatives are more likely to endorse climate change conspiracy theories. Evidence does suggest that people are less inclined to believe in conspiracy theories the more educated they are, but exactly why this is the case is still unclear. Higher education is correlated with a complex of many other facts and it remains to be seen whether the education itself is the cause of decreased belief.

My own suspicion is that an important part of the appeal of conspiracy theories is that we tend to find appeals to coincidence unconvincing. This is often perfectly reasonable. If a recently-elected politician installs close friends and family to all important posts, insisting that, by coincidence, their friends and family were the most qualified individuals for the posts, we will be rightly suspicious. It can be a problem, however, when this suspicion transfers over to extraordinarily complex events. For example, there is a long-standing conspiracy theory that Bill Clinton arranged for the assassination of dozens of people with whom he had varying levels of contact. An enormous part of the appeal stems from the seeming unlikelihood of so many deaths that can be linked to Clinton. Of course, a president comes into contact with a staggering number of people, and some small number of these are bound to die in a variety of ways. It is not surprising that a number of people who met Clinton died; it is merely coincidental, and what would really be surprising is if no one who he met died. When a case is sufficiently complex (such as the network of everyone a United States president meets), coincidence will often be the explanation for events.

An image titled "The Clinton Body Bags" that lists people Bill Clinton came in contact with who are now dead.There are other cases where “conspiratorial thinking,” in which we are inclined to suspect agency is the cause of an event rather than coincidence, seems appropriate. It seems appropriate that homicide detectives presume agency was involved rather than coincidence when investigating an unexpected death, and that they ask questions like “Who would benefit from this?” in determining what agency was at work. On the other hand, it seems inappropriate that a voter should presume agency rather than coincidence was involved in explaining why a former member of the president’s staff died in a plane crash, and should not ask questions like “Who would benefit from this?” in order to discover who might have arranged the disaster.

Conspiratorial thinking, utilized in the appropriate circumstances, is a powerful tool that allows us to discount appropriately explanations that are, in other circumstances, much more plausible. When applied in inappropriate circumstances, on the other hand, conspiratorial thinking can metastasize and overwhelm our rational thinking. For instance, someone nearly always benefits from any event, so that asking “Who would benefit from this?” will nearly always yield a suspect. Without compelling reason to suspect agency in the first place, it is important to refrain from asking the question. My hope is to run a series of psychological studies to see whether people who believe in conspiracy theories are also more suspicious of coincidence as an explanation in general.

In my next post, I’ll talk some about some difficulties I’ve had running the initial portion of this study, as well as talk a bit about the digital tools I’m using.

What is a Conspiracy Theory?

Part of my internship at the Scholarly Commons will be a series of blog posts to describe my research and the different tools that I’ll be using to pursue it. In this first post, I’ll begin to give an account of my overall research project. Future posts will deal with other parts of the research project, what sorts of tools I will be using, the ways I’m gaining facility with those tools, and the progress of the research itself.

What Is a Conspiracy Theory?

The first phase of my research involves developing an empirically informed definition of ‘conspiracy theory’. A naive definition might be “a theory that involves a conspiracy.” This leads to many things being called conspiracy theories that would not ordinarily be understood as such. For example, the official account of 9/11 would be a conspiracy theory: Al-Qaeda, working in secret (i.e., as a conspiracy), planned and carried out the attack. While such a capacious definition of ‘conspiracy theory’ might be appealing, it runs counter to many people’s sense of what the term means.

In the philosophical literature on conspiracy theories, several definitions have been floated, but there is no agreed upon way of understanding the term. As a result, it can be difficult to know whether there is a connection between what the philosopher in question is discussing and what is commonly taken to be a conspiracy theory. In the psychological and sociological literature on conspiracy theories, much less attention is paid to questions of definition, with certain “paradigmatic” theories normally being presented as conspiracy theories. In these cases, it is reasonable to wonder if the theories presented as “paradigmatic” are actually atypical in some respects, barring some evidence that they actually are typical. In both the philosophical and psychological/sociological cases, I am concerned that choices of particular conspiracy theories might be the result of unintentional “cherry-picking” of examples, which would threaten to skew accounts.

To solve this problem, I am inspired by Paul Thagard’s study presented in “Creative Combination of Representations: Scientific Discovery and Technological Innovation,” in his collection “The Cognitive Science of Science: Explanation, Discovery, and Conceptual Change.” In that study, Thagard investigates two texts, one an anthology of important scientific discoveries, the other an anthology of important inventions. For each text, he goes through each entry, coding for the presence of certain features. This allows him to give an empirically informed account of typical features of both scientific discovery and technological innovation (specifically with regard to their use of representational combination). While there are still reasons to be wary of treating these features as characteristic (e.g., it might be that the most important scientific discoveries are actually atypical cases of scientific discovery), this is at least a good effort at moving away from cherrypicking examples.

In my own study, I have selected an anthology of various conspiracy theories. The text is “Conspiracies and Secret Societies: The Complete Dossier” by Brad and Sherry Steiger.

I have selected several features to look for in the entries. In particular, my own hypothesis is that conspiracy theories typically utilize appeals to coincidence in order to motivate their own acceptance. An appeal to coincidence occurs when a theory criticizes an alternative theory for containing an explanation that involves coincidence. For example, some 9/11 conspiracy theories observe that a number of unusual stock market behaviors with regard to the airlines involved were exhibited in the days leading up to the attack, and that this led to a great deal of profit on the part of the investors. One way to explain this would be to say it was a coincidence. The conspiracy theorists insist instead that it is evidence of insider trading among people who had knowledge of the planned attack. This substitution of conspiracy for coincidence is, I predict, typical of conspiracy theories in general.

Two lab assistants and I are working through the book and coding for the presence of the chosen features. The hope is that we will be able to make some empirically informed judgments about what features are typical of conspiracy theories. In addition to this strategy, I will utilize some text mining strategies in order to both check our own conclusions and look for other typical features we may have missed. Although the amount of text in the book is fairly small, the hope is that a meaningful topic model might be developed in order to see if the groupings that we notice ourselves emerge in the model as well. This would give us some additional evidence to be satisfied with our own coding. It could also be the case that the model could reveal certain other groupings based around features we had not coded for that we could then independently check. In the end, the hope is that we will be able to give examples of paradigmatic conspiracy theories and have some empirical backing for our choices.

In my next post, I will discuss the second component of my research project: an investigation into why conspiracy theories are so appealing to people.