Using Reddit’s API to Gather Text Data

The Reddit logo.

I initially started my research with an eye to using digital techniques to analyze an encyclopedia that collects a number of conspiracy theories in order to determine what constitute typical features of conspiracy theories. At this point, I realize there were two flaws in my original plan. First, as discussed in a previous blog post, the book I selected failed to provide the sort of evidence I required to establish typical features of conspiracy theories. Second, the length of the book, though sizable, was nowhere near large enough to provide a corpus that I could use a topic model on in order to derive interesting information.

My hope is that I can shift to online sources of text in order to solve both of these problems. Specifically, I will be collecting posts from Reddit. The first problem was that my original book merely stated the content of a number of conspiracy theories, without making any effort to convince the reader that they were true. As a result, there was little evidence of typical rhetorical and argumentative strategies that might characterize conspiracy theories. Reddit, on the other hand, will provide thousands of instances of people interacting in an effort to convince other Redditors of the truth or falsity of particular conspiracy theories. The sorts of strategies that were absent from the encyclopedia of conspiracy theories will, I hope, be present on Reddit.
The second problem was that the encyclopedia failed to provide a sufficient amount of text. Utilizing Reddit will certainly solve this problem; in less than twenty-four hours, there were over 1,300 comments on a recent post alone. If anything, the solution to this problem represents a whole new problem: how to deal with such a vast (and rapidly changing) body of information.

Before I worry too much about that, it is important that I be able to access the information in the first place. To do this, I’ll need to use Reddit’s API. API stands for Application Programming Interface, and it’s essentially a tool for letting a user interact with a system. In this case, the API allows a user to access information on the Reddit website. Of course, we can already do this with an web browser. The API, however, allows for more fine-grained control than a browser. When I navigate to a Reddit page with my web browser, my requests are interpreted in a very pre-scripted manner. This is convenient; when I’m browsing a website, I don’t want to have to specify what sort of information I want to see every time a new page loads. However, if I’m looking for very specific information, it can be useful to use an API to hone in on just the relevant parts of the website.

For my purposes, I’m primarily interested in downloading massive numbers of Reddit posts, with just their text body, along with certain identifiers (e.g., the name of the poster, timestamp, and the relation of that post to other posts). The first obstacle to accessing the information I need is learning how to request just that particular set of information. In order to do this, I’ll need to learn how to write a request in Reddit’s API format. Reddit provides some help with this, but I’ve found these other resources a bit more helpful. The second obstacle is that I will need to write a program that automates my requests, to save myself from having to perform tens of thousands of individual requests. I will be attempting to do this in Python. While doing this, I’ll have to be sure that I abide by Reddit’s regulations for using its API. For example, a limited number of requests per minute are allowed so that the website is not overloaded. There seems to be a dearth of example code on the Internet for text acquisition of this sort, so I’ll be posting a link to any functional code I write in future posts.

An Obstacle and (Hopefully) a Solution in Digital Research

This post is part of an ongoing series about my research on conspiracy theories and the tools I use to pursue it. You can read Part I: What is a Conspiracy Theory and Part II: Why Are Conspiracy Theories So Compelling? on Commons Knowledge.


Part of my research project, in which I am attempting to give an empirically-informed account of what constitutes a conspiracy theory, involves reading through a text that compiles a few hundred different conspiracy theories and gives brief accounts of them. By reading through them and coding for the presence of various features, I hoped to get some information on what features were most typical of conspiracy theories. My own suspicion is that an important part of the appeal of conspiracy theories is that, in general, we tend to find appeals to coincidence unconvincing. For example, if a student is repeatedly absent from class on test days and gives as an excuse a series of illnesses, we are inclined to find this unconvincing. It seems very coincidental that their illnesses always occur on test days. Of course, it’s possible that it really is a coincidence, but we strongly discount the explanatory weight of such an appeal. If we cast about for another theory to explain their absences, we quickly happen across another one: The student didn’t prepare for the tests and so wanted to avoid coming to class. Again, it’s possible this theory is incorrect, but it is much more satisfying than the theory that the student just coincidentally gets sick on test days.

I suspect that conspiracy theories derive much of their appeal from the unsatisfactory character of appeals to coincidence. To pick just one example: The plane crash that killed Senator Paul Wellstone in 2002 has been the focal point of a number of conspiracy theories. The standard account is that the crash was due to pilot error. One way that suspicion has been raised about this account is by noting a number of seeming coincidences. One coincidence is that Wellstone was one of the most outspoken voices against the Bush administration at the time. It raised some conspiracy theorists’ eyebrows that such a prominent liberal voice “just happened” to die unexpectedly in a plane crash. Alternative theories propose that members of the Bush administration arranged for the assassination of Wellstone. Another purported coincidence involved accounts of electronic malfunction: cell phones and automatic garage door openers in the vicinity supposedly malfunctioned at roughly the same time the plane crashed. This is accounted for in some conspiracy theories by appealing to the use of electromagnetic frequency weapon which disabled the controls of the plane, while also causing malfunction in nearby electronic equipment. My hypothesis is that this style of explanation and theory development is typical of conspiracy theories in general.

Federal investigators sift through debris in this Oct. 27, 2002 file photo, from the twin engine plane that crashed two days earlier near Eveleth, Minn,. killing Sen. Paul Wellstone, his wife Sheila, daughter Marcia and several others. The National Transportation Safety Board is ready to vote on the likely cause of the 2002 accident. (AP Photo/Jim Mone, File)

In my study, I have noted whether each conspiracy theory in my chosen compilation points to an appeal to coincidence in the rival (usually “standard”) account and, if it does, whether it then appeals to a conspiracy in order to provide a “better” theory to replace the one that appeals to coincidence. Unfortunately, this strategy has hit an obstacle. While there is a strong correlation between pointing to an appeal to coincidence as a problem with a theory and substituting an appeal to conspiracy in its place, there were relatively few theories that appeared to do this, based on the text. Even in cases where I knew the criticism of appeals to coincidence frequently played a large role in the justification of particular conspiracy theories, I often found no evidence of this in the brief accounts of the theories given in the book. It could be, of course, that my hypothesis is just mistaken; given that it didn’t match up with other conclusions I’d drawn based on other sources, however, I am inclined to think the problem is the source text. Thinking it over, it was clear that the nature of the text was to present the accounts “objectively,” stating the content of the views, normally without any effort to convince the reader one way or the other. Occasionally, talk of coincidences finds its way into the entries. Even then, it is only rarely explicit: for example, there are zero appearances of the word ‘coincidence’ or the phrase ‘just happened’, only three appearances of ‘coincidental’, and all appearances of ‘happened to’ are, upon checking the context, not related to appeals to coincidence in explanation. No other typically “coincidental” language makes a significant appearance. My concern is that this reveals only that my chosen text doesn’t address whether the presented theories are explanatorily superior to its rivals or explore how they developed in the first place. Since those are the areas in which coincidence would play a larger role, I’ve concluded that my chosen text is misleading as a source of data about conspiracy theories (at least with regard to the role of coincidence; in other areas, such as whether the theory is an official or unofficial account, it is much more reliable).

In order to resolve this obstacle, I have settled on using primary sources that are more likely to involve attempts to persuade the reader that the contained theory is correct and superior to its rivals. This includes books and websites that present a particular conspiracy theory and online fora where proponents of various conspiracy theories argue and collaborate in the development of conspiracy theories. This is obviously vastly larger than the single anthology I initially intended to use. My focus currently is finding a way to carve out a manageable chunk of this gigantic data set, most likely from online message boards, like Reddit, and use text from these fora to find evidence for my hypothesis about appeals to coincidence. This will necessitate the use of at least two kinds of digital techniques: web scraping, in order to extract usable text from a large number of individual websites, and topic modeling, in order to find meaningful relationships within an otherwise unmanageably large corpus. In my next post, I will talk about my initial forays into these techniques.

Why Are Conspiracy Theories So Compelling?

In my last post, I described the first phase of my research, in which I am attempting to develop an empirically informed definition of ‘conspiracy theory’. In this post, I want to discuss the second focus of my research: why it is that conspiracy theories are so compelling for so many people.

Newspaper article with headline "Kennedy Slain by CIA, Mafia, Castro, LBJ, Teamsters, Freemasons"Although the specifics can be debated, it is clear that conspiracy theories are very popular. In a recent survey, 61% of participants claimed belief in some form of a conspiracy theory about the assassination of John F. Kennedy. This could possibly be attributed to increased publicity about the event due to its impending fiftieth anniversary and coverage of the release of some previously classified documents regarding it. But in an even more wide-ranging study four years ago, the number was 51%. At the very least, it looks plausible that more than half of Americans believe in this particular conspiracy theory, and there are plenty of other theories out there. For example, approximately 40% of respondents endorsed the conspiracy theory that the FDA is withholding a natural cancer cure.

Conspiracy theories are often treated dismissively as the ravings of deranged paranoiacs. Yet, we have good reason to believe that a majority of Americans believe in at least one conspiracy theory, and we can’t dismiss all of them in this way. Why, then, are conspiracy theories so compelling? There are a number of predictors for belief in conspiracy theories. The best is belief in other conspiracy theories: if someone believes one conspiracy theory, the likelihood that they believe another goes up. Other predictors are useful for predicting if a subject believes in a particular conspiracy theory, but not for the likelihood that they believe in conspiracy theories generally. Belief in conspiracy theories is common regardless of race, but white Americans are more likely than African-Americans to believe in Sandy Hook conspiracy theories (in which the government supposedly faked the Sandy Hook shooting in order to initiate more stringent gun control laws), while African-Americans are more likely than white Americans to believe that the CIA developed AIDS in order to kill African-American populations. Similarly, political liberals are more likely to endorse GMO conspiracy theories, while political conservatives are more likely to endorse climate change conspiracy theories. Evidence does suggest that people are less inclined to believe in conspiracy theories the more educated they are, but exactly why this is the case is still unclear. Higher education is correlated with a complex of many other facts and it remains to be seen whether the education itself is the cause of decreased belief.

My own suspicion is that an important part of the appeal of conspiracy theories is that we tend to find appeals to coincidence unconvincing. This is often perfectly reasonable. If a recently-elected politician installs close friends and family to all important posts, insisting that, by coincidence, their friends and family were the most qualified individuals for the posts, we will be rightly suspicious. It can be a problem, however, when this suspicion transfers over to extraordinarily complex events. For example, there is a long-standing conspiracy theory that Bill Clinton arranged for the assassination of dozens of people with whom he had varying levels of contact. An enormous part of the appeal stems from the seeming unlikelihood of so many deaths that can be linked to Clinton. Of course, a president comes into contact with a staggering number of people, and some small number of these are bound to die in a variety of ways. It is not surprising that a number of people who met Clinton died; it is merely coincidental, and what would really be surprising is if no one who he met died. When a case is sufficiently complex (such as the network of everyone a United States president meets), coincidence will often be the explanation for events.

An image titled "The Clinton Body Bags" that lists people Bill Clinton came in contact with who are now dead.There are other cases where “conspiratorial thinking,” in which we are inclined to suspect agency is the cause of an event rather than coincidence, seems appropriate. It seems appropriate that homicide detectives presume agency was involved rather than coincidence when investigating an unexpected death, and that they ask questions like “Who would benefit from this?” in determining what agency was at work. On the other hand, it seems inappropriate that a voter should presume agency rather than coincidence was involved in explaining why a former member of the president’s staff died in a plane crash, and should not ask questions like “Who would benefit from this?” in order to discover who might have arranged the disaster.

Conspiratorial thinking, utilized in the appropriate circumstances, is a powerful tool that allows us to discount appropriately explanations that are, in other circumstances, much more plausible. When applied in inappropriate circumstances, on the other hand, conspiratorial thinking can metastasize and overwhelm our rational thinking. For instance, someone nearly always benefits from any event, so that asking “Who would benefit from this?” will nearly always yield a suspect. Without compelling reason to suspect agency in the first place, it is important to refrain from asking the question. My hope is to run a series of psychological studies to see whether people who believe in conspiracy theories are also more suspicious of coincidence as an explanation in general.

In my next post, I’ll talk some about some difficulties I’ve had running the initial portion of this study, as well as talk a bit about the digital tools I’m using.

What is a Conspiracy Theory?

Part of my internship at the Scholarly Commons will be a series of blog posts to describe my research and the different tools that I’ll be using to pursue it. In this first post, I’ll begin to give an account of my overall research project. Future posts will deal with other parts of the research project, what sorts of tools I will be using, the ways I’m gaining facility with those tools, and the progress of the research itself.

What Is a Conspiracy Theory?

The first phase of my research involves developing an empirically informed definition of ‘conspiracy theory’. A naive definition might be “a theory that involves a conspiracy.” This leads to many things being called conspiracy theories that would not ordinarily be understood as such. For example, the official account of 9/11 would be a conspiracy theory: Al-Qaeda, working in secret (i.e., as a conspiracy), planned and carried out the attack. While such a capacious definition of ‘conspiracy theory’ might be appealing, it runs counter to many people’s sense of what the term means.

In the philosophical literature on conspiracy theories, several definitions have been floated, but there is no agreed upon way of understanding the term. As a result, it can be difficult to know whether there is a connection between what the philosopher in question is discussing and what is commonly taken to be a conspiracy theory. In the psychological and sociological literature on conspiracy theories, much less attention is paid to questions of definition, with certain “paradigmatic” theories normally being presented as conspiracy theories. In these cases, it is reasonable to wonder if the theories presented as “paradigmatic” are actually atypical in some respects, barring some evidence that they actually are typical. In both the philosophical and psychological/sociological cases, I am concerned that choices of particular conspiracy theories might be the result of unintentional “cherry-picking” of examples, which would threaten to skew accounts.

To solve this problem, I am inspired by Paul Thagard’s study presented in “Creative Combination of Representations: Scientific Discovery and Technological Innovation,” in his collection “The Cognitive Science of Science: Explanation, Discovery, and Conceptual Change.” In that study, Thagard investigates two texts, one an anthology of important scientific discoveries, the other an anthology of important inventions. For each text, he goes through each entry, coding for the presence of certain features. This allows him to give an empirically informed account of typical features of both scientific discovery and technological innovation (specifically with regard to their use of representational combination). While there are still reasons to be wary of treating these features as characteristic (e.g., it might be that the most important scientific discoveries are actually atypical cases of scientific discovery), this is at least a good effort at moving away from cherrypicking examples.

In my own study, I have selected an anthology of various conspiracy theories. The text is “Conspiracies and Secret Societies: The Complete Dossier” by Brad and Sherry Steiger.

I have selected several features to look for in the entries. In particular, my own hypothesis is that conspiracy theories typically utilize appeals to coincidence in order to motivate their own acceptance. An appeal to coincidence occurs when a theory criticizes an alternative theory for containing an explanation that involves coincidence. For example, some 9/11 conspiracy theories observe that a number of unusual stock market behaviors with regard to the airlines involved were exhibited in the days leading up to the attack, and that this led to a great deal of profit on the part of the investors. One way to explain this would be to say it was a coincidence. The conspiracy theorists insist instead that it is evidence of insider trading among people who had knowledge of the planned attack. This substitution of conspiracy for coincidence is, I predict, typical of conspiracy theories in general.

Two lab assistants and I are working through the book and coding for the presence of the chosen features. The hope is that we will be able to make some empirically informed judgments about what features are typical of conspiracy theories. In addition to this strategy, I will utilize some text mining strategies in order to both check our own conclusions and look for other typical features we may have missed. Although the amount of text in the book is fairly small, the hope is that a meaningful topic model might be developed in order to see if the groupings that we notice ourselves emerge in the model as well. This would give us some additional evidence to be satisfied with our own coding. It could also be the case that the model could reveal certain other groupings based around features we had not coded for that we could then independently check. In the end, the hope is that we will be able to give examples of paradigmatic conspiracy theories and have some empirical backing for our choices.

In my next post, I will discuss the second component of my research project: an investigation into why conspiracy theories are so appealing to people.