Meet Dan Tracy, Information Sciences and Digital Humanities Librarian

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Dan Tracy, Information Sciences and Digital Humanities Librarian.


What is your background and work experience?

I originally come from a humanities background and completed a PhD in literature specializing in 20th century American literature, followed by teaching as a lecturer for two years. I had worked a lot with librarians during that time with my research and teaching. When you’re a PhD student in English, you teach a lot of rhetoric, and I also taught some literature classes. As a rhetoric instructor I worked closely with the Undergraduate Library’s instruction services, which exposed me to the work librarians do with instruction.

Then I did a Master’s in Library and Information Science here, knowing that I was interested in being an academic librarian, probably something in the area of being a subject librarian in the humanities. And then I began this job about five years ago. So I’ve been here about five years now in this role. And just began doing Digital Humanities over the summer. I had previously done some liaison work related to digital humanities, especially related to digital publishing, and I had been doing some research related to user experience and digital publishing as related to DH publishing tools.

What led you to this field?

A number of things. One was having known quite a number of people who went into librarianship who really liked it and talked about their work. Another was my experience working with librarians in terms of their instruction capacity. I was interested in working in an academic environment and I was interested in academic librarianship and teaching. And also, especially as things evolved, after I went back for the degree in library and information science, I also found a lot of other things to be interested in as well, including things like digital humanities and data issues.

What is your research agenda?

My research looks at user experience in digital publishing. Primarily in the context of both ebook formats and newer experimental forms of publication such as web and multi-modal publishing with tools like Scalar, especially from the reader side, but also from the creator side of these platforms.

Do you have any favorite work-related duties?

As I mentioned before, instruction was an initial draw to librarianship. I like anytime I can teach and work with students, or faculty for that matter, and help them learn new things. That would probably be a top thing. And I think increasingly the chances I get to work with digital collections issues as well. I think there’s a lot of exciting work to do there in terms of delivering our digital collections to scholars to complete both traditional and new forms of research projects.

What are some of your favorite underutilized resources that you would recommend to researchers?

I think there’s a lot. I think researchers are already aware of digital primary sources in general, but I do think there’s a lot more for people to explore in terms of collections we’ve digitized and things we can do with those through our digital library, and through other digital library platforms, like DPLA (Digital Public Library of America).

I think that a lot of our digital image collections are especially underutilized. I think people are more aware that we have digitized text sources, but not aware of our digitized primary sources that are images that have value of research objects, including analyzed computational analysis. We also have more and more access to the text data behind our various vendor platforms, which is a resource various researchers on campus increasingly need but don’t always know is available.

If you could recommend one book to beginning researchers in your field, what would you recommend?

If you’re just getting started, I think a good place to look is at the Debates in the Digital Humanities books, which are collections of essays that touch on a variety of critical issues in digital humanities research and teaching. This is a good place to start if you want to get a taste of the ongoing debates and issues. There are open access copies of them available online, so they are easy to get to.

Dan Tracy can be reached at dtracy@illinois.edu.

Why Are Conspiracy Theories So Compelling?

In my last post, I described the first phase of my research, in which I am attempting to develop an empirically informed definition of ‘conspiracy theory’. In this post, I want to discuss the second focus of my research: why it is that conspiracy theories are so compelling for so many people.

Newspaper article with headline "Kennedy Slain by CIA, Mafia, Castro, LBJ, Teamsters, Freemasons"Although the specifics can be debated, it is clear that conspiracy theories are very popular. In a recent survey, 61% of participants claimed belief in some form of a conspiracy theory about the assassination of John F. Kennedy. This could possibly be attributed to increased publicity about the event due to its impending fiftieth anniversary and coverage of the release of some previously classified documents regarding it. But in an even more wide-ranging study four years ago, the number was 51%. At the very least, it looks plausible that more than half of Americans believe in this particular conspiracy theory, and there are plenty of other theories out there. For example, approximately 40% of respondents endorsed the conspiracy theory that the FDA is withholding a natural cancer cure.

Conspiracy theories are often treated dismissively as the ravings of deranged paranoiacs. Yet, we have good reason to believe that a majority of Americans believe in at least one conspiracy theory, and we can’t dismiss all of them in this way. Why, then, are conspiracy theories so compelling? There are a number of predictors for belief in conspiracy theories. The best is belief in other conspiracy theories: if someone believes one conspiracy theory, the likelihood that they believe another goes up. Other predictors are useful for predicting if a subject believes in a particular conspiracy theory, but not for the likelihood that they believe in conspiracy theories generally. Belief in conspiracy theories is common regardless of race, but white Americans are more likely than African-Americans to believe in Sandy Hook conspiracy theories (in which the government supposedly faked the Sandy Hook shooting in order to initiate more stringent gun control laws), while African-Americans are more likely than white Americans to believe that the CIA developed AIDS in order to kill African-American populations. Similarly, political liberals are more likely to endorse GMO conspiracy theories, while political conservatives are more likely to endorse climate change conspiracy theories. Evidence does suggest that people are less inclined to believe in conspiracy theories the more educated they are, but exactly why this is the case is still unclear. Higher education is correlated with a complex of many other facts and it remains to be seen whether the education itself is the cause of decreased belief.

My own suspicion is that an important part of the appeal of conspiracy theories is that we tend to find appeals to coincidence unconvincing. This is often perfectly reasonable. If a recently-elected politician installs close friends and family to all important posts, insisting that, by coincidence, their friends and family were the most qualified individuals for the posts, we will be rightly suspicious. It can be a problem, however, when this suspicion transfers over to extraordinarily complex events. For example, there is a long-standing conspiracy theory that Bill Clinton arranged for the assassination of dozens of people with whom he had varying levels of contact. An enormous part of the appeal stems from the seeming unlikelihood of so many deaths that can be linked to Clinton. Of course, a president comes into contact with a staggering number of people, and some small number of these are bound to die in a variety of ways. It is not surprising that a number of people who met Clinton died; it is merely coincidental, and what would really be surprising is if no one who he met died. When a case is sufficiently complex (such as the network of everyone a United States president meets), coincidence will often be the explanation for events.

An image titled "The Clinton Body Bags" that lists people Bill Clinton came in contact with who are now dead.There are other cases where “conspiratorial thinking,” in which we are inclined to suspect agency is the cause of an event rather than coincidence, seems appropriate. It seems appropriate that homicide detectives presume agency was involved rather than coincidence when investigating an unexpected death, and that they ask questions like “Who would benefit from this?” in determining what agency was at work. On the other hand, it seems inappropriate that a voter should presume agency rather than coincidence was involved in explaining why a former member of the president’s staff died in a plane crash, and should not ask questions like “Who would benefit from this?” in order to discover who might have arranged the disaster.

Conspiratorial thinking, utilized in the appropriate circumstances, is a powerful tool that allows us to discount appropriately explanations that are, in other circumstances, much more plausible. When applied in inappropriate circumstances, on the other hand, conspiratorial thinking can metastasize and overwhelm our rational thinking. For instance, someone nearly always benefits from any event, so that asking “Who would benefit from this?” will nearly always yield a suspect. Without compelling reason to suspect agency in the first place, it is important to refrain from asking the question. My hope is to run a series of psychological studies to see whether people who believe in conspiracy theories are also more suspicious of coincidence as an explanation in general.

In my next post, I’ll talk some about some difficulties I’ve had running the initial portion of this study, as well as talk a bit about the digital tools I’m using.

Announcing Topic Modeling – Theory & Practice Workshops

An example of text from a topic modeling project.We’re happy to announce that Scholarly Commons intern Matt Pitchford is teaching a series of two Savvy Researcher Workshops on Topic Modeling. You may be following Matt’s posts on Studying Rhetorical Responses to Terrorism on Twitter or Preparing Your Data for Topic Modeling on Commons Knowledge, and now is your chance to learn the basics from the master! The workshops  will be held on Wednesday, December 6th and Friday, December 8th. See below for more details!

Topic Modeling, Part 1: Theory

  • Wednesday, December 6th, 11am-12pm
  • 314 Main Library
  • Topic models are a computational method of identifying and grouping interrelated words in any set of texts. In this workshop we will focus on how topic models work, what kinds of academic questions topic models can help answer, what they allow researchers to see, and what they can obfuscate. This will be a conversation about topic models as a tool and method for digital humanities research. In part 2, we will actually construct some topic models using MALLET.
  • To sign up for the class, see the Savvy Researcher calendar

Topic Modeling, Part 2: Practice

  • Friday, December 8th, 11am-12pm
  • 314 Main Library
  • In this workshop, we will use MALLET, a java based package, to construct and analyze a topic model. Topic models are a computational method of identifying and grouping interrelated words in any set of text. This workshop will focus on how to correctly set up the code, understand the output of the model, and how to refine the code for best results. No experience necessary. You do not need to have attended Part I in order to attend this workshop.
  • To sign up for this class, see the Savvy Researcher calendar

Save the Date: Edward Ayers Talk

Ayers_Edward_photo

We are so excited to be hosting a talk by Edward Ayers this coming March! Save the date on your calendars:

March 29, 2018 | 220 Main Library | 4-6 pm

Edward Ayers has been named National Professor of the Year, received the National Humanities Medal from President Obama at the White House, won the Bancroft Prize and Beveridge Prize in American history, and was a finalist for the National Book Award and the Pulitzer Prize. He has collaborated on major digital history projects including the Valley of the Shadow, American Panorama, and Bunk, and is one of the co-hosts for BackStory, a popular podcast about American history. He is Tucker-Boatwright Professor of the Humanities and president emeritus at the University of Richmond as well as former Dean of Arts and Sciences at the University of Virginia. His most recent book is The Thin Light of Freedom: The Civil War and Emancipation in the Heart of America, published in 2017 by W. W. Norton.

His talk will be on “Twenty-Five Years in Digital History and Counting”.

Edward Ayers began a digital project just before the World Wide emerged and has been pursuing one project or several projects ever since. His current work focuses on the two poles of possibility in the medium: advanced projects in visualizing processes of history at the Digital Scholarship Lab at the University of Richmond and a public-facing project in Bunk, curating representations of the American past for a popular audience.

We hope you’ll be able to join us at his public talk in March!

Open Source Tools for Social Media Analysis

Photograph of a person holding an iPhone with various social media icons.

This post was guest authored by Kayla Abner.


Interested in social media analytics, but don’t want to shell out the bucks to get started? There are a few open source tools you can use to dabble in this field, and some even integrate data visualization. Recently, we at the Scholarly Commons tested a few of these tools, and as expected, each one has strengths and weaknesses. For our exploration, we exclusively analyzed Twitter data.

NodeXL

NodeXL’s graph for #halloween (2,000 tweets)

tl;dr: Light system footprint and provides some interesting data visualization options. Useful if you don’t have a pre-existing data set, but the one generated here is fairly small.

NodeXL is essentially a complex Excel template (it’s classified as a Microsoft Office customization), which means it doesn’t take up a lot of space on your hard drive. It does have advantages; it’s easy to use, only requiring a simple search to retrieve tweets for you to analyze. However, its capabilities for large-scale analysis are limited; the user is restricted to retrieving the most recent 2,000 tweets. For example, searching Twitter for #halloween imported 2,000 tweets, every single one from the date of this writing. It is worth mentioning that there is a fancy, paid version that will expand your limit to 18,000, the maximum allowed by Twitter’s API, or 7 to 8 days ago, whichever comes first. Even then, you cannot restrict your data retrieval by date. NodeXL is a tool that would mostly be most successful in pulling recent social media data. In addition, if you want to study something besides Twitter, you will have to pay to get any other type of dataset, i.e., Facebook, Youtube, Flickr.

Strengths: Good for a beginner, differentiates between Mentions/Retweets and original Tweets, provides a dataset, some light data visualization tools, offers Help hints on hover

Weaknesses: 2,000 Tweet limit, free version restricted to Twitter Search Network

TAGS

TAGSExplorer’s data graph (2,902 tweets). It must mean something…

tl;dr: Add-on for Google Sheets, giving it a light system footprint as well. Higher restriction for number of tweets. TAGS has the added benefit of automated data retrieval, so you can track trends over time. Data visualization tool in beta, needs more development.

TAGS is another complex spreadsheet template, this time created for use with Google Sheets. TAGS does not have a paid version with more social media options; it can only be used for Twitter analysis. However, it does not have the same tweet retrieval limit as NodeXL. The only limit is 18,000 or seven days ago, which is dictated by Twitter’s Terms of Service, not the creators of this tool. My same search for #halloween with a limit set at 10,000 retrieved 9,902 tweets within the past seven days.

TAGS also offers a data visualization tool, TAGSExplorer, that is promising but still needs work to realize its potential. As it stands now in beta mode, even a dataset of 2,000 records puts so much strain on the program that it cannot keep up with the user. It can be used with smaller datasets, but still needs work. It does offer a few interesting additional analysis parameters that NodeXL lacked, such as ability to see Top Tweeters and Top Hashtags, which works better than the graph.

These graphs have meaning!

Strengths: More data fields, such as the user’s follower and friend count, location, and language (if available), better advanced search (Boolean capabilities, restrict by date or follower count), automated data retrieval

Weaknesses: data visualization tool needs work

Hydrator

Simple interface for Documenting the Now’s Hydrator

tl;dr: A tool used for “re-hydrating” tweet IDs into full tweets, to comply with Twitter’s Terms of Service. Not used for data analysis; useful for retrieving large datasets. Limited to datasets already available.

Documenting the Now, a group focused on collecting and preserving digital content, created the Hydrator tool to comply with Twitter’s Terms of Service. Download and distribution of full tweets to third parties is not allowed, but distribution of tweet IDs is allowed. The organization manages a Tweet Catalog with files that can be downloaded and run through the Hydrator to view the full Tweet. Researchers are also invited to submit their own dataset of Tweet IDs, but this requires use of other software to download them. This tool does not offer any data visualization, but is useful for studying and sharing large datasets (the file for the 115th US Congress contains 1,430,133 tweets!). Researchers are limited to what has already been collected, but multiple organizations provide publicly downloadable tweet ID datasets, such as Harvard’s Dataverse. Note that the rate of hydration is also limited by Twitter’s API, and the Hydrator tool manages that for you. Some of these datasets contain millions of tweet IDs, and will take days to be transformed into full tweets.

Strengths: Provides full tweets for analysis, straightforward interface

Weaknesses: No data analysis tools

Crimson Hexagon

If you’re looking for more robust analytics tools, Crimson Hexagon is a data analytics platform that specializes in social media. Not limited to Twitter, it can retrieve data from Facebook, Instagram, Youtube, and basically any other online source, like blogs or forums. The company has a partnership with Twitter and pays for greater access to their data, giving the researcher higher download limits and a longer time range than they would receive from either NodeXL or TAGS. One can access tweets starting from Twitter’s inception, but these features cost money! The University of Illinois at Urbana-Champaign is one such entity paying for this platform, so researchers affiliated with our university can request access. One of the Scholarly Commons interns, Matt Pitchford, uses this tool in his research on Twitter response to terrorism.

Whether you’re an experienced text analyst or just want to play around, these open source tools are worth considering for different uses, all without you spending a dime.

If you’d like to know more, researcher Rebekah K. Tromble recently gave a lecture at the Data Scientist Training for Librarians (DST4L) conference regarding how different (paid) platforms influence or bias analyses of social media data. As you start a real project analyzing social media, you’ll want to know how the data you have gathered may be limited to adjust your analysis accordingly.

Preparing Your Data for Topic Modeling

In keeping with my series of blog posts on my research project, this post is about how to prepare your data for input into a topic modeling package. I used Twitter data in my project, which is relatively sparse at only 140 characters per tweet, but the principles can be applied to any document or set of documents that you want to analyze.

Topic Models:

Topic models work by identifying and grouping words that co-occur into “topics.” As David Blei writes, Latent Dirichlet allocation (LDA) topic modeling makes two fundamental assumptions: “(1) There are a fixed number of patterns of word use, groups of terms that tend to occur together in documents. Call them topics. (2) Each document in the corpus exhibits the topics to varying degree. For example, suppose two of the topics are politics and film. LDA will represent a book like James E. Combs and Sara T. Combs’ Film Propaganda and American Politics: An Analysis and Filmography as partly about politics and partly about film.”

Topic models do not have any actual semantic knowledge of the words, and so do not “read” the sentence. Instead, topic models use math. The tokens/words that tend to co-occur are statistically likely to be related to one another. However, that also means that the model is susceptible to “noise,” or falsely identifying patterns of cooccurrence if non-important but highly-repeated terms are used. As with most computational methods, “garbage in, garbage out.”

In order to make sure that the topic model is identifying interesting or important patterns instead of noise, I had to accomplish the following pre-processing or “cleaning” steps.

  • First, I removed the punctuation marks, like “,.;:?!”. Without this step, commas started showing up in all of my results. Since they didn’t add to the meaning of the text, they were not necessary to analyze.
  • Second, I removed the stop-words, like “I,” “and,” and “the,” because those words are so common in any English sentence that they tend to be over-represented in the results. Many of my tweets were emotional responses, so many authors wrote in the first person. This tended to skew my results, although you should be careful about what stop words you remove. Simply removing stop-words without checking them first means that you can accidentally filter out important data.
  • Finally, I removed too common words that were uniquely present in my data. For example, many of my tweets were retweets and therefore contained the word “rt.” I also ended up removing mentions to other authors because highly retweeted texts tended to mean that I was getting Twitter user handles as significant words in my results.

Cleaning the Data:

My original data set was 10 Excel files of 10,000 tweets each. In order to clean and standardize all these data points, as well as combining my file into one single document, I used OpenRefine. OpenRefine is a powerful tool, and it makes it easy to work with all your data at once, even if it is a large number of entries. I uploaded all of my datasets, then performed some quick cleaning available under the “Common Transformations” option under the triangle dropdown at the head of each column: I changed everything to lowercase, unescaped HTML characters (to make sure that I didn’t get errors when trying to run it in Python), and removed extra white spaces between words.

OpenRefine also lets you use regular expressions, which is a kind of search tool for finding specific strings of characters inside other text. This allowed me to remove punctuation, hashtags, and author mentions by running a find and replace command.

  • Remove punctuation: grel:value.replace(/(\p{P}(?<!’)(?<!-))/, “”)
    • Any punctuation character is removed.
  • Remove users: grel:value.replace(/(@\S*)/, “”)
    • Any string that begins with an @ is removed. It ends at the space following the word.
  • Remove hashtags: grel:value.replace(/(#\S*)/,””)
    • Any string that begins with a # is removed. It ends at the space following the word.

Regular expressions, commonly abbreviated as “regex,” can take a little getting used to in order to understand how they work. Fortunately, OpenRefine itself has some solid documentation on the subject, and I also found this cheatsheet valuable as I was trying to get it work. If you want to create your own regex search strings, regex101.com has a tool that lets you test your expression before you actually deploy it in OpenRefine.

After downloading the entire data set as a Comma Separated Value (.csv) file, I then used the Natural Language ToolKit (NLTK) for Python to remove stop-words. The code itself can be found here, but I first saved the content of the tweets as a single text file, and then I told NLTK to go over every line of the document and remove words that are in its common stop word dictionary. The output is then saved in another text file, which is ready to be fed into a topic modeling package, such as MALLET.

At the end of all these cleaning steps, my resulting data is essentially composed of unique nouns and verbs, so, for example, @Phoenix_Rises13’s tweet “rt @drlawyercop since sensible, national gun control is a steep climb, how about we just start with orlando? #guncontrolnow” becomes instead “since sensible national gun control steep climb start orlando.” This means that the topic modeling will be more focused on the particular words present in each tweet, rather than commonalities of the English language.

Now my data is cleaned from any additional noise, and it is ready to be input into a topic modeling program.

Interested in working with topic models? There are two Savvy Researcher topic modeling workshops, on December 6 and December 8, that focus on the theory and practice of using topic models to answer questions in the humanities. I hope to see you there!

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 2

Welcome to part two of our two-part series on creating web maps! If you haven’t read part one yet, you can find it here. If you have read part one, we’re going to pick up right where we left off.

Now that we’ve imported our CSV into a web map, we can begin to play around with how the data is represented. You should be brought to the “Change Style” screen after importing your data, which presents you with a drop-down menu and three drawing styles to choose from:

Map Viewer Change Style Screen

Map Viewer Change Style Screen

Hover over each drawing style for more information, and click each one to see how they visualize your data. Don’t worry if you mess up — you can always return to this screen later. We’re going to use “Types (Unique symbols)” for this exercise because it gives us more options to fiddle with, but feel free to dive into the options for each of the other two drawing styles if you like how they represent your data. Click “select” under “Types (Unique symbols)” to apply the style, then select a few different attributes in the “Choose an attribute to show” dropdown menu to see how they each visualize your data. I’m choosing “Country” as my attribute to show simply because it gives us an even distribution of colors, but for your research data you will want to select this attribute carefully. Next, click “Options” on our drawing style and you can play with the color, shape, name, transparency, and visible range for all of your symbols. Click the three-color bar (pictured below) to change visual settings for all of your symbols at once. When you’re happy with the way your symbols look, click OK and then DONE.

Now is also good time to select your basemap, so click “Basemap” on the toolbar and select one of the options provided — I’m using “Light Gray Canvas” in my examples here.

Change all symbols icon

Click the three-color bar to change visual settings for all of your symbols at once

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Now that our data is visualized the way we want, we can do a lot of interesting things depending on what we want to communicate. As an example, let’s pretend that our IP addresses represent online access points for a survey we conducted on incarceration spending in the United States. We can add some visual insight to our data by inserting a layer from the web using “Add → Search for layers” and overlaying a relevant layer. I searched for “inmate spending” and found a tile layer created by someone at the Esri team that shows the ratio of education spending to incarceration spending per state in the US:

"Search for Layers" screen

The “Search for Layers” screen

 

 

 

 

 

 

 

 

 

 

 

 

 

You might notice in the screenshot above that there are a lot of similar search results; I’m picking the “EducationVersusIncarceration” tile layer (circled) because it loads faster than the feature layer. If you want to learn why this happens, check out Esri’s documentation on hosted feature layers.

We can add this layer to our map by clicking “Add” then “Done Adding Layers,” and voilà, our data is enriched! There are many public layers created by Esri and the ArcGIS Online community that you can search through, and even more GIS data hosted elsewhere on the web. You can use the Scholarly Commons geospatial data page if you want to search for public geographic information to supplement your research.

Now that we’re done visualizing our data, it’s time to export it for presentation. There are a few different ways that we can do this: by sharing/embedding a link, printing to a pdf/image file, or creating a presentation. If we want to create a public link so people can access our map online, click “Share” in the toolbar to generate a link (note: you have to check the “Everyone (public)” box for this link to work). If we want to download our map as a pdf or image, click “Print” and then select whether or not we want to include a legend, and we’ll be brought to a printer-friendly page showing the current extent of our map. Creating an ArcGIS Online Presentation is a third option that allows you to create something akin to a PowerPoint, but I won’t get into the details here. Go to Esri’s Creating Presentations help page for more information.

Click to enlarge the GIFs below and see how to export your map as a link and as an image/pdf:

Share web map via public link

Note: you can also embed your map in a webpage by selecting “Embed In Website” in the Share menu.

 

Saving the map as an image/pdf using the "Print" button in the toolbar. Note: if you save your map as an image using "save image as..." you will only save the map, NOT the legend.

Save your map as an image/pdf. NOTE: if you save your map as an image using “save image as…” you can only save the map, NOT the legend.

While there are a lot more tools that we can play with using our free ArcGIS Online accounts – clustering, pop-ups, bookmarks, labels, drawing styles, distance measuring – and even more tools with an organizational account – 25 different built-in analyses, directions, Living Atlas Layers – this is all that we have time for right now. Keep an eye out for future Commons Knowledge blog posts on GIS, and visit our GIS page for even more resources!

What is a Conspiracy Theory?

Part of my internship at the Scholarly Commons will be a series of blog posts to describe my research and the different tools that I’ll be using to pursue it. In this first post, I’ll begin to give an account of my overall research project. Future posts will deal with other parts of the research project, what sorts of tools I will be using, the ways I’m gaining facility with those tools, and the progress of the research itself.

What Is a Conspiracy Theory?

The first phase of my research involves developing an empirically informed definition of ‘conspiracy theory’. A naive definition might be “a theory that involves a conspiracy.” This leads to many things being called conspiracy theories that would not ordinarily be understood as such. For example, the official account of 9/11 would be a conspiracy theory: Al-Qaeda, working in secret (i.e., as a conspiracy), planned and carried out the attack. While such a capacious definition of ‘conspiracy theory’ might be appealing, it runs counter to many people’s sense of what the term means.

In the philosophical literature on conspiracy theories, several definitions have been floated, but there is no agreed upon way of understanding the term. As a result, it can be difficult to know whether there is a connection between what the philosopher in question is discussing and what is commonly taken to be a conspiracy theory. In the psychological and sociological literature on conspiracy theories, much less attention is paid to questions of definition, with certain “paradigmatic” theories normally being presented as conspiracy theories. In these cases, it is reasonable to wonder if the theories presented as “paradigmatic” are actually atypical in some respects, barring some evidence that they actually are typical. In both the philosophical and psychological/sociological cases, I am concerned that choices of particular conspiracy theories might be the result of unintentional “cherry-picking” of examples, which would threaten to skew accounts.

To solve this problem, I am inspired by Paul Thagard’s study presented in “Creative Combination of Representations: Scientific Discovery and Technological Innovation,” in his collection “The Cognitive Science of Science: Explanation, Discovery, and Conceptual Change.” In that study, Thagard investigates two texts, one an anthology of important scientific discoveries, the other an anthology of important inventions. For each text, he goes through each entry, coding for the presence of certain features. This allows him to give an empirically informed account of typical features of both scientific discovery and technological innovation (specifically with regard to their use of representational combination). While there are still reasons to be wary of treating these features as characteristic (e.g., it might be that the most important scientific discoveries are actually atypical cases of scientific discovery), this is at least a good effort at moving away from cherrypicking examples.

In my own study, I have selected an anthology of various conspiracy theories. The text is “Conspiracies and Secret Societies: The Complete Dossier” by Brad and Sherry Steiger.

I have selected several features to look for in the entries. In particular, my own hypothesis is that conspiracy theories typically utilize appeals to coincidence in order to motivate their own acceptance. An appeal to coincidence occurs when a theory criticizes an alternative theory for containing an explanation that involves coincidence. For example, some 9/11 conspiracy theories observe that a number of unusual stock market behaviors with regard to the airlines involved were exhibited in the days leading up to the attack, and that this led to a great deal of profit on the part of the investors. One way to explain this would be to say it was a coincidence. The conspiracy theorists insist instead that it is evidence of insider trading among people who had knowledge of the planned attack. This substitution of conspiracy for coincidence is, I predict, typical of conspiracy theories in general.

Two lab assistants and I are working through the book and coding for the presence of the chosen features. The hope is that we will be able to make some empirically informed judgments about what features are typical of conspiracy theories. In addition to this strategy, I will utilize some text mining strategies in order to both check our own conclusions and look for other typical features we may have missed. Although the amount of text in the book is fairly small, the hope is that a meaningful topic model might be developed in order to see if the groupings that we notice ourselves emerge in the model as well. This would give us some additional evidence to be satisfied with our own coding. It could also be the case that the model could reveal certain other groupings based around features we had not coded for that we could then independently check. In the end, the hope is that we will be able to give examples of paradigmatic conspiracy theories and have some empirical backing for our choices.

In my next post, I will discuss the second component of my research project: an investigation into why conspiracy theories are so appealing to people.

Spotlight: Library of Congress Labs

The Library of Congress Labs banner.

It’s always exciting when an organization with as much influence and reach as the Library of Congress decides to do something different. Library of Congress Labs is a new endeavor by the LoC, “a place to encourage innovation with Library of Congress digital collections”. Launched on September 19, 2017, Labs is a place of experimentation,and will host a rotating selection of “experiments, projects, events and resources” as well as blog posts and video presentations.

In this post, I’ll just be faffing around the Labs website, focusing on the “Experiments” portion of the site. (We’ll look at “LC for Robots” in another post.) As of writing (10/3/17), there are three “Experiments” on the site — Beyond Words, Calling All Storytellers, and #AsData Poster Series. Right now, Calling All Storytellers is just asking for people’s ideas for the website, so I’ll briefly go over Beyond Words and #As Data Poster Series and give my thoughts on them.

Beyond Words

Beyond Words is a crowd-sourced transcription system for the LoC’s Chronicling America digitized newspaper collection. Users are invited to mark, transcribe, and verify World War I newspapers. Tasks are split, so the user only does one task at a time. Overall, however, it’s pretty similar to other transcription efforts already on the Internet; though, the tools tend to be better-working, less-clunky, and clearer than some other efforts I’ve seen.

#AsData Poster Series

The #AsData Poster Series is a poster series by artist Oliver Baez Bendorf,  commissioned by the LoC for their Collections as Data Summit in September 2016. the posters are beautiful and artistic, and represent the themes of the summit. One aspect that I like about this page, is that it’s not just the posters themselves, but includes more information, like an interview with the artist. That being said, it does seem like a bit of a placeholder.

While I was excited to explore the experiments, I’m hopeful to see more innovative ideas from the Library of Congress. The Labs “Experiments” have great potential, and it will be interesting to stay tuned and where they go next.

Keep an eye on Commons Knowledge in the next few weeks, when we talk about the “LC for Robots” Labs page!

Studying Rhetorical Responses to Terrorism on Twitter

As a part of my internship at the Scholarly Commons, I’m going to do a series of posts describing the tools and methodologies that I’ve used in order to work on my dissertation project. This write-up serves as an introduction to my project, it’s larger goals, and tools that I use to start working with my data.

The Dissertation Project

In general, my dissertation draws on computational methodologies to account for the digital circulation and fragmentation of political movement texts in new media environments. In particular, I will examine the rhetorical responses on Twitter to three terrorist attacks in the U.S.: the 2013 Boston Marathon Bombing, the 2015 San Bernardino Shooting, and the 2016 Orlando Nightclub shooting. I begin with the idea that terrorism is a kind of message directed at an audience, and I am interested in how digital audiences in the U.S. come to understand, make meaning of, and navigate uncertainty following a terrorist attack. I am interested in the patterns of narratives, community construction, and expressions of affect that characterize terrorism as a social media phenomenon.

I am interested in the following questions: What methods might rhetorical scholars use to better understand the vast numbers of texts, posts, and “tweets” that make up our social media? How do digital audiences construct meanings in light of terrorist attacks? How does the interwoven agency and materiality of digital spaces influence forms of rhetorical action, such as invention and style? In order to better address such challenges, I turn to the tools and techniques of the Digital Humanities as a computational modes of analysis to examine the digitally circulated rhetoric surrounding terror events. Investigation of this rhetoric using topic models will help scholars to understand not only particular aspects of terrorism as a social media phenomenon, but also to better see the ways that community and identity are themselves formed amid digitally circulated texts.

At the beginning of this project, I had no experience working with textual data, so the following posts represent a cleaned and edited version of the learning process I went through. There was a lot of mess and exploration involved, but that meant I’ve come to understand a lot more.

Gathering The Tools

I use a Mac, so accessing the command line is as simple as firing up the Terminal.App. Windows users have to do a bit more work in order to get all these tools, but plenty of tutorials can be found with a quick search.

Python (Anaconda)
The first big choice was to learn how to code in R or Python. I’d heard that Python was better for text and R was better for statistical work, but it seems that it mostly comes down to personal preference as you can find people doing both in either language. Both R and Python have a bit of a learning curve, but a quick search for topic modeling in Python gave me a ton of useful results, so I chose to start there.

Anaconda is a package management system for the Python languages. What’s great about Anaconda is not only that it has a robust management system (so I can easily download the tools and libraries that I need without having to worry about dependencies or other errors), but also that it encourages the creation of “environments” for you to work in. This means that I can make mistakes or install and uninstall packages without having to worry about messing up my overall system or my other environments.

Instructions for downloading Anaconda can be found here, and I found this cheat-sheet very useful in setting up my initial environments. Python has a ton of documentation, so these pages are useful, and there are plenty of tutorials online. Each environment comes with a few default packages, and I quickly added some toolkits for processing text and plotting graphs.

Confirming the Conda installation in Terminal, activating an environment, and listing the installed packages.

StackOverflow
Lots of people working with Python have the same problems or issues that I did. Whenever my code encountered an error, or when I didn’t know how to do something like write to a .txt file, searching StackOverflow usually got me on the right track. Most answers link to the Python documentation that relates to the question, so not only did I fix what was wrong but I also learned why.

GitHub
Sometimes scholars put their code on GitHub for sharing, advancing research, and confirming their findings. I found code on here that is for topic modeling in Python, as well as setting up repositories for my own work. Using GitHub is a useful version control system, so it also meant that I never “lost” old code and could track changes over time.

Programming Historian
This is a site for scholars interested in learning how to use tools for Digital Humanities work. There are some great tutorials here on a range of topics, including how to set up and use Python. It’s approachable and does a good job of covering everything you need to know.

These tools, taken together, form the basis of my workspace for dealing with my data. Upcoming topics will cover Data Collection, Cleaning the Data, Topic Models, and Graphing the Results.