What Are the Digital Humanities?

Introduction

As new technology has revolutionized the ways all fields gather information, scholars have integrated the use of digital software to enhance traditional models of research. While digital software may seem only relevant in scientific research, digital projects play a crucial role in disciplines not traditionally associated with computer science. One of the biggest digital initiatives actually takes place in fields such as English, History, Philosophy, and more in what is known as the digital humanities. The digital humanities are an innovative way to incorporate digital data and computer science within the confines of humanities-based research. Although some aspects of the digital humanities are exclusive to specific fields, most digital humanities projects are interdisciplinary in nature. Below are three general impacts that projects within the digital humanities have enhanced the approaches to humanities research for scholars in these fields.

Digital Access to Resources

Digital access is a way of taking items necessary for humanities research and creating a system where users can easily access these resources. This work involves digitizing physical items and formatting them to store them on a database that permits access to its contents. Since some of these databases may hold thousands or millions of items, digital humanists also work to find ways so that users may locate these specific items quickly and easily. Thus, digital access requires both the digitization of physical items and their storage on a database as well as creating a path for scholars to find them for research purposes.

Providing Tools to Enhance Interpretation of Data and Sources

The digital humanities can also change how we can interpret sources and other items used in the digital humanities. Data Visualization software, for example, helps simplify large, complex datasets and presents this data in ways more visually appealing. Likewise, text mining software uncovers trends through analyzing text that potentially saves hours or even days for digital humanists had they analyzed the text through analog methods. Finally, Geographic Information Systems (GIS) software allows for users working on humanities projects to create special types of maps that can both assist in visualizing and analyzing data. These software programs and more have dramatically transformed the ways digital humanists interpret and visualize their research.

Digital Publishing

The digital humanities have opened new opportunities for scholars to publish their work. In some cases, digital publishing is simply digitizing an article or item in print to expand the reach of a given publication to readers who may not have direct access to the physical version. Other times, some digital publishing initiatives publish research that is only accessible in a digital format. One benefit to digital publishing is that it opens more opportunities for scholars to publish their research and expands the audience for their research than just publishing in print. As a result, the digital humanities provide scholars more opportunities to publish their research while also expanding the reach of their publications.

How Can I Learn More About the Digital Humanities?

There are many ways to get involved both at the University of Illinois as well as around the globe. Here is just a list of a few examples that can help you get started on your own digital humanities project:

  • HathiTrust is a partnership through the Big Ten Academic Alliance that holds over 17 million items in its collection.
  • Internet Archive is a public, multimedia database that allows for open access to a wide range of materials.
  • The Scholarly Commons page on the digital humanities offers many of the tools used for data visualization, text mining, GIS software, and other resources that enhance analysis within a humanities project. There are also a couple of upcoming Savvy Researcher workshops that will go over how to use software used in the digital humanities
  • Sourcelab is an initiative through the History Department that works to publish and preserve digital history projects. Many other humanities fields have equivalents to Sourcelab that serves the specific needs of a given discipline.

Introductions: What is Digital Scholarship, anyways?

This is the beginning of a new series where we introduce you to the various topics that we cover in the Scholarly Commons. Maybe you’re new to the field or you’re just to the point where you’re just too afraid to ask… Fear not! We are here to take it back to the basics!

What is digital scholarship, anyways?

Digital scholarship is an all-encompassing term and it can be used very broadly. Digital scholarship refers to the use of digital tools, methods, evidence, or any other digital materials to complete a scholarly project. So, if you are using digital means to construct, analyze, or present your research, you’re doing digital scholarship!

It seems really basic to say that digital scholarship is any project that uses digital means because nowadays, isn’t that every project? Yes and No. We use the term digital quite liberally…If you used Microsoft Word to just write your essay about a lab you did during class – that is not digital scholarship however if you used specialized software to analyze the results from a survey you used to gather data then you wrote about it in an essay that you then typed in Microsoft Word, then that is digital scholarship! If you then wanted to get this essay published and hosted in an online repository so that other researchers can find your essay, then that is digital scholarship too!

Many higher education institutions have digital scholarship centers at their campus that focus on providing specialized support for these types of projects. The Scholarly Commons is a digital scholarship space in the University Main Library! Digital scholarship centers are often pushing for new and innovative means of discovery. They have access to specialized software and hardware and provide a space for collaboration and consultations with subject experts that can help you achieve your project goals.

At the Scholarly Commons, we support a wide array of topics that support digital and data-driven scholarship that this series will cover in the future. We have established partners throughout the library and across the wider University campus to support students, staff, and faculty in their digital scholarship endeavors.

Here is a list of the digital scholarship service points we support:

You can find a list of all the software the Scholarly Commons has to support digital scholarship here and a list of the Scholarly Commons hardware here. If you’re interested in learning more about the foundations of digital scholarship follow along to our Introductions series as we got back to the basics.

As always, if you’re interested in learning more about digital scholarship and how to  support your own projects you can fill out a consultation request form, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Simple NetInt: A New Data Visualization Tool from Illinois Assistant Professor, Juan Salamanca

Juan Salamanca Ph.D, Assistant Professor in the School of Art and Design at the University of Illinois Urbana-Champaign recently created a new data visualization tool called Simple NetInt. Though developed from a tool he created a few years ago, this tool brings entirely new opportunities to digital scholarship! This week we had the chance to talk to Juan about this new tool in data visualization. Here’s what he said…

Simple NetInt is a JavaScript version of NetInt, a Java-based node-link visualization prototype designed to support the visual discovery of patterns across large dataset by displaying disjoint clusters of vertices that could be filtered, zoomed in or drilled down interactively. The visualization strategy used in Simple NetInt is to place clustered nodes in independent 3D spaces and draw links between nodes across multiple spaces. The result is a simple graphic user interface that enables visual depth as an intuitive dimension for data exploration.

Simple NetInt InterfaceCheck out the Simple NetInt tool here!

In collaboration with Professor Eric Benson, Salamanca tested a prototype of Simple NetInt with a dataset about academic publications, episodes, and story locations of the Sci-Fi TV series Firefly. The tool shows a network of research relationships between these three sets of entities similar to a citation map but on a timeline following the episodes chronology.

What inspired you to create this new tool?

This tool is an extension of a prototype I built five years ago for the visualization of financial transactions between bank clients. It is a software to visualize networks based on the representation of entities and their relationships and nodes and edges. This new version is used for the visualization of a totally different dataset:  scholarly work published in papers, episodes of a TV Series, and the narrative of the series itself. So, the network representation portrays relationships between journal articles, episode scripts, and fictional characters. I am also using it to design a large mural for the Siebel Center for Design.

What are your hopes for the future use of this project?

The final goal of this project is to develop an augmented reality visualization of networks to be used in the field of digital humanities. This proof of concept shows that scholars in the humanities come across datasets with different dimensional systems that might not be compatible across them. For instance, a timeline of scholarly publications may encompass 10 or 15 years, but the content of what is been discussed in that body of work may encompass centuries of history. Therefore, these two different temporal dimensions need to be represented in such a way that helps scholars in their interpretations. I believe that an immersive visualization may drive new questions for researchers or convey new findings to the public.

What were the major challenges that came with creating this tool?

The major challenge was to find a way to represent three different systems of coordinates in the same space. The tool has a universal space that contains relative subspaces for each dataset loaded. So, the nodes instantiated from each dataset are positioned in their own coordinate system, which could be a timeline, a position relative to a map, or just clusters by proximities. But the edges that connect nodes jump from one coordinate system to the other. This creates the idea of a system of nested spaces that works well with few subspaces, but I am still figuring out what is the most intuitive way to navigate larger multidimensional spaces.

What are your own research interests and how does this project support those?

My research focuses on understanding how designed artifacts affect the viscosity of social action. What I do is to investigate how the design of artifacts facilitates or hinders the cooperation of collaboration between people. I use visual analytics methods to conduct my research so the analysis of networks is an essential tool. I have built several custom-made tools for the observation of the interaction between people and things, and this is one of them.

If you would like to learn more about Simple NetInt you can find contact information for Professor Juan Salamanca here and more information on his research!

If you’re interested in learning more about data visualizations for your own projects, check out our guide on visualizing your data, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Free, Open Source Optical Character Recognition with gImageReader

Optical Character Recognition (OCR) is a powerful tool to transform scanned, static images of text into machine-readable data, making it possible to search, edit, and analyze text. If you’re using OCR, chances are you’re working with either ABBYY FineReader or Adobe Acrobat Pro. However, both ABBYY and Acrobat are propriety software with a steep price tag, and while they are both available in the Scholarly Commons, you may want to perform OCR beyond your time at the University of Illinois.

Thankfully, there’s a free, open source alternative for OCR: Tesseract. By itself, Tesseract only works through the command line, which creates a steep learning curve for those unaccustomed to working with a command-line interface (CLI). Additionally, it is fairly difficult to transform a jpg into a searchable PDF with Tesseract.

Thankfully, there are many free, open source programs that provide Tesseract with a graphical user interface (GUI), which not only makes Tesseract much easier to use, some of them come with layout editors that make it possible to create searchable PDFs. You can see the full list of programs on this page.

The program logo for gImageReader

The program logo for gImageReader

In this post, I will focus on one of these programs, gImageReader, but as you can see on that page, there are many options available on multiple operating systems. I tried all of the Windows-compatible programs and decided that gImageReader was the closest to what I was looking for, a free alternative to ABBYY FineReader that does a pretty good job of letting you correct OCR mistakes and exporting to a searchable PDF.

Installation

gImageReader is available for Windows and Linux. Though they do not include a Mac compatible version in the list of releases, it may be possible to get it to work if you use a package manager for Mac such as Homebrew. I have not tested this though, so I do not make any guarantees about how possible it is to get a working version of gImageReader on Mac.

To install gImageReader on Windows, go to the releases page on Windows. From there, go to the most recent release of the program at the top and click Assets to expand the list of files included with the release. Then select the file that has the .exe extension to download it. You can then run that file to install the program.

Manual

The installation of gImageReader comes with a manual as an HTML file that can be opened by any browser. As of the date of this post, the Fossies software archive is hosting the manual on its website.

Setting OCR Mode

gImageReader has two OCR modes: “Plain Text” and “hOCR, PDF”. Plain Text is the default mode and only recognizes the text itself without any formatting or layout detection. You can export this to a text file or copy and paste it into another program. This may be useful in some cases, but if you want to export a searchable PDF, you will need to use hOCR, PDF mode. hOCR is a standard for formatting OCR text using either XML or HTML and includes layout information, font, OCR result confidence, and other formatting information.

To set the recognition to hOCR, PDF mode, go to the toolbar at the top. It includes a section for “OCR mode” with a dropdown menu. From there, click the dropdown and select hOCR, PDF:

gImageReader Toolbar

This is the toolbar for gImageReader. You can set OCR mode by using the dropdown that is the third option from the right.

Adding Images, Performing Recognition, and Setting Language

If you have images already scanned, you can add them to be recognized by clicking the Add Images button on the left panel, which looks like a folder. You can then select multiple images if you want to create a multipage PDF. You can always add more images later by clicking that folder button again.

On that left panel, you can also click the Acquire tab button, which allows you to get images directly from a scanner, if the computer you’re using has a scanner connected.

Once you have the images you want, click the Recognize button to recognize the text on the page. Please note that if you have multiple images added, you’ll need to click this button for every page.

If you want to perform recognition on a language other than English, click the arrow next to Recognize. You’ll need to have that language installed, but you can install additional languages by clicking “Manage Languages” in the dropdown appears. If the language is already installed, you can go to the first option listed in the dropdown to select a different language.

Viewing the OCR Result

In this example, I will be performing OCR on this letter by Franklin D. Roosevelt:

Raw scanned image of a typewritten letter signed by Franklin Roosevelt

This 1928 letter from Franklin D. Roosevelt to D. H. Mudge Sr. is courtesy of Madison Historical: The Online Encyclopedia and Digital Archive for Madison County Illinois. https://madison-historical.siue.edu/archive/items/show/819

Once you’ve performed OCR, there will be an output panel on the right. There are a series of buttons above the result. Click the button on the far right to view the text result overlaid on top of the image:

The text result of performing OCR on the FDR letter overlaid on the original scan.

Here is the the text overlaid on an image of the original scan. Note how the scan is slightly transparent now to make the text easier to read.

Correcting OCR

The OCR process did a pretty good job with this example, but it there are a handful of errors. You can click on any of the words of text to show them on the right panel. I will click on the “eclnowledgment” at the end of the letter to correct it. It will then jump to that part of the hOCR “tree” on the right:

hOCR tree in gImageReader, which shows the recognition result of each word in a tree-like structure.

The hOCR tree in gImageReader, which also shows OCR result.

Note in this screenshot I have clicked the second button from the right to show the confidence values, where the higher the number, the higher the confidence Tesseract has with the result. In this case, it is 67% sure that eclnowledgement is correct. Since it obviously isn’t correct, we can type new text by double-clicking on the word in this panel and type “acknowledgement.” You can do this for any errors on the page.

Other correction tips:

  1. If there are any regions that are not text that it is still recognizing, you can right click them on the right and delete them.
  2. You can change the recognized font and its size by going to the bottom area labeled “Properties.” Font size is controlled by the x_fsize field, and x_font has a dropdown where you can select a font.
  3. It is also possible to change the area of the blue word box once it is selected, simply by clicking and dragging the edges and corners.
  4. If there is an area of text that was not captured by the recognition, you can also right click in the hOCR “tree” to add text blocks, paragraphs, textlines, and words to the document. This allows you to draw a box on image and then type what the text says.

Exporting to PDF

Once you are done making OCR corrections, you can export to a searchable PDF. To do so, click the Export button above the hOCR “tree,” which is the third button from the left. Then, select export to PDF. It then gives you several options to set the compression and quality of the PDF image, and once you click OK, it should export the PDF.

Conclusion

Unfortunately, there are some limitations to gImageViewer, as can often be the case with free, open source software. Here are some potential problems you may have with this program:

  1. While you can add new areas to recognize with OCR, there is not a way to change the order of these elements inside the hOCR “tree,” which could be an issue if you are trying to make the reading order clear for accessibility reasons. One potential workaround could be to use the Reading Order options on Adobe Acrobat, which you can read about in this libguide.
  2. You cannot show the areas of the document that are in a recognition box unless you click on a word, unlike ABBYY FineReader which shows all recognition areas at once on the original image.
  3. You cannot perform recognition on all pages at once. You have to click the recognition button individually for each page.
  4. Though there are some image correction options to improve OCR, such as brightness, contrast, and rotation, it does not have as many options as ABBYY FineReader.

gImageViewer is not nearly as user friendly or have all of the features that ABBYY FineReader has, so you will probably want to use ABBYY if it is available to you. However, I find gImageViewer a pretty good program that can meet most general OCR needs.

Illinois Digital Humanities Projects That Will Blow Your Mind

We are living in a moment where we get to discover the exciting possibilities of working, learning, and sharing on digital formats. I have decided to use this as an opportunity to appreciate the ways in which others have already embraced the power digital platforms to enhance their research. In this post I will highlight three amazing digital humanities projects that researchers right here at the University of Illinois contributed to. For each project I will provide a link to their official web page, a brief description of the project, and the name and department of the UIUC researcher who contributed to this project. Prepare to be wowed by the amazing digital work to have come out of our University research community.

Owen Wilson mouthing the word wow

“Prepare to be wowed”- Owen Wilson

Continue reading

Virtual Museums

There is no doubt that technology is changing the way we interact with the world including that of centuries old institutions: Museums!

Historically, museums have been seen as these sacred spaces of knowledge meant to bring together a communities and historically, this also meant a physical space. However, with the heroine that is technology constantly amplifying in our everyday lives, there is no doubt that this would eventually reach museums. While many museums have implemented technology into their education and resources, we are now beginning to see the emergence of what’s called a “virtual museum.”  While the definition of what constitutes these new virtual museums can be precarious, one thing is in common: they exist electronically in cyberspace.

Image result for cyberspace gif

The vast empire of Digital Humanities is allowing space for these virtual museums to cultivate. Information seeking in a digital age is expanding its customs and there is a wide spectrum of resources available—virtual museums being one example. These online organizations are made up of digital exhibitions and exist in their entity on the World Wide Web.

Museums offer an experience. Unlike libraries or archives, people more often utilize museums as a form of tourism and entertainment but within this, they are also centers of research. Museums house information resources that are not accessible to the everyday scholar. Virtual museums are increasing this accessibility.

Here are some examples of virtual museum spaces:

While there are arguments from museum scholars about the legitimacy of these online spaces, I do not think it should discount the ways in which people are using them to share knowledge. While there is still much to develop in virtual museums, the increasing popularity of the digital humanities is granting people an innovative way to interact with art and artifacts that were previously inaccessible. Museums are spaces of exhibition and research — so why limit that to a physical space? It will be interesting to keep an eye on where things may go and question the full potential this convention can contribute to scholarly research!

The Scholarly Commons has many resources that can help you create your own digital hub of information. You can digitize works on one of our high resolution scanners, create these into searchable documents with OCR software, and publish online with tools such as Omeka, a digital publishing software.

You can also consult with our expert in Digital Humanities, Spencer Keralis, to find the right tools for your project. Check out last week’s blog post to learn more about him.

Maybe one day all museums will be available virtually? What are your thoughts?

Meet Spencer Keralis, Digital Humanities Librarian

Spencer Keralis teaches a class.

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features one of the newest members of our team, Spencer Keralis, Digital Humanities Librarian.


What is your background and work experience?

I have a Ph.D. in English and American Literature from New York University. I started working in libraries in 2011 as a Council on Library and Information Resources (CLIR) Fellow with the University of North Texas Libraries, doing research on data management policy and practice. This turned into a position as a Research Associate Professor working to catalyze digital scholarship on campus, which led to the development of Digital Frontiers, which is now an independent non-profit corporation. I serve as the Executive Director of the organization and help organize the annual conference. I have previous experience working as a project manager in telecom and non-profits. I’ve also taught in English and Communications at the university level since 2006.

What led you to this field?

My CLIR Fellowship really sparked the career change from English to libraries, but I had been considering libraries as an alternate career path prior to that. My doctoral research was heavily archives-based, and I initially thought I’d pursue something in rare books or special collections. My interest in digital scholarship evolved later.

What is your research agenda?

My current project explores how the HIV-positive body is reproduced and represented in ephemera and popular culture in the visual culture of the early years of the AIDS epidemic. In American popular culture, representations of the HIV-positive body have largely been defined by Therese Frare’s iconic 1990 photograph of gay activist David Kirby on his deathbed in an Ohio hospital, which was later used for a United Colors of Benetton ad. Against this image, and other representations which medicalized or stigmatized HIV-positive people, people living with AIDS and their allies worked to remediate the HIV-positive body in ephemera including safe sex pamphlets, zines, comics, and propaganda. In my most recent work, I’m considering the reclamation of the erotic body in zines and comics, and how the HIV-positive body is imagined as an object of desire differently in these underground publications than they are in mainstream queer comics representing safer sex. I also consider the preservation and digitization of zines and other ephemera as a form of remediation that requires a specific ethical positioning in relation to these materials and the community that produced them, engaging with the Zine Librarians’ Code of Conduct, folksonomies and other metadata schema, and collection and digitization policies regarding zines from major research libraries. This research feels very timely and urgent given rising rates of new infection among young people, but it’s also really fun because the materials are so eclectic and often provocative. You can check out a bit of this research on the UNT Comics Studies blog.

 Do you have any favorite work-related duties?

I love working with students and helping them develop their research questions. Too often students (and sometimes faculty, let’s be honest) come to me and ask “What tools should I learn?” I always respond by asking them what their research question is. Not every research question is going to be amenable to digital tools, and not every tool works for every research question. But having a conversation about how digital methods can potentially enrich a student’s research is always rewarding, and I always learn so much from these conversations.

 What are some of your favorite underutilized resources that you would recommend to researchers?

I think comics and graphic novels are generally underappreciated in both pedagogy and research. There are comics on every topic, and historical comics go back much further than most people realize. I think the intersection of digital scholarship with comics studies has a lot of potential, and a lot of challenges that have yet to be met – the technical challenge of working with images is significant, and there has yet to be significant progress on what digital scholarship in comics might look like. I also think comics belong more in classes – all sorts of classes, there are comics on every topic, from math and physics, to art and literature – than they are now because they reach students differently than other kinds of texts.

 If you could recommend one book or resource to beginning researchers in your field, what would you recommend?

I’m kind of obsessed with Liz Losh and Jacque Wernimont’s edited collection Bodies of Information: Intersectional Feminism and Digital Humanities because it’s such an important intervention in the field. I’d rather someone new to DH start there than with some earlier, canonical works because it foregrounds alternative perspectives and methodologies without centering a white, male perspective. Better, I think, to start from the margins and trouble some of the traditional narratives in the discipline right out the gate. I’m way more interested in disrupting monolithic or hegemonic approaches to DH than I am in gatekeeping, and Liz and Jacque’s collection does a great job of constructively disrupting the field.

Digital Humanities Maps

Historically, maps were 2D, printed, sometimes wildly inaccurate representations of space. Today, maps can still be wildly inaccurate, but digital tools provide a way to apply more data to a spatial representation. However, displaying data on a map is not a completely new idea. W.E.B. DuBois’ 1899 sociological research study “The Philadelphia Negro” was one of the first to present data in a visual format, both in map form and other forms.

map of the seventh ward of philadelphia, each household is drawn on the map and represented by a color corresponding to class standing

The colors on the map indicate the class standing of each household.

Digital maps can add an interesting, spatial dimension to your humanities or social science research. People respond well to visuals, and maps provide a way to display a visual that corresponds to real-life space. Today we’ll highlight some DH mapping projects, and point to some resources to create your own map!

(If you are interested in DH maps, attend our Mapping in the Humanities workshop next week!)

Sources of Digital Maps

Some sources of historical maps, like the ones below, openly provide access to georeferenced maps. “Georeferencing,” also called “georectifying,” is the process of aligning historical maps to precisely match a modern-day map. Completing this process allows historical maps to be used in digital tools, like GIS software. Think of it like taking an image of a map, and assigning latitude/longitude pairs to different points on the map that correspond to modern maps. Currently, manually matching the points up is the only way to do this!

A map from a book about Chicago placed over a modern map of Chicago.

A map of Chicago from 1891 overlaid on a modern map of the Chicago area.

David Rumsey Map Collection
The David Rumsey Map Collection is a mainstay in the world of historical maps. As of the time of writing, 68% of their total map collection has been georeferenced. There are other ways to interact with the collection, such as searching on a map for specific locations, or even viewing the maps in Second Life!

NYPL Map Warper
The New York Public Library’s Map Warper offers a large collection of historical maps georeferenced by users. Most maps have been georeferenced at this point, but users can still help out!

OpenStreetMap
OpenStreetMap is the open-source, non-proprietary version of Google Maps. Many tools used in DH, like Leaflet and Omeka’s Neatline, use OpenStreetMap’s data and applications to create maps.

Digital Mapping Humanities Projects

Get inspired! Here are some DH mapping projects to help you think about applying mapping to your own research.

Maps provide the perfect medium for DH projects focused on social justice and decolonization. Native-land.ca is a fairly recent example of this application. The project, started as a non-academic, private project in 2015, has now transformed into a not-for-profit organization. Native-land.ca attempts to visualize land belonging to native nations in the Americas and Australia, but notably not following the official or legal boundaries. The project also provides a teacher’s guide to assist developing a curriculum around colonization in schools.

map of florida with data overlay indicating which native tribes have rights to the land

The state of Florida occupies the territory of multiple native tribes, notably those of the Seminole.

Other projects use digital tools that show a map in conjunction with another storytelling tool, like a timeline or a narrative. The levantCarta/Beirut project uses a timeline to filter which images show up on the connected map of Beirut. We can easily see the spatial representation of a place in a temporal context. A fairly easy tool for this kind of digital storytelling is TimeMapper.

For a more meta example, check out this map of digital humanities labs by Urszula Pawlicka-Deger. Of course these DH centers do projects other than mapping, but even the study of DH can make use of digital mapping!

If you’re interested in adding maps to your humanities research, check out our workshop this semester on humanities mapping. There are also great tutorials for more advanced mapping on The Programming Historian.

And as always, feel free to reach out to the Scholarly Commons (sc@library.illinois.edu) to get started on your digital humanities project.

Transformation in Digital Humanities

The opinions presented in this piece are solely the author’s and referenced authors. This is meant to serve as a synthesis of arguments made in DH regarding transformation.

How do data and algorithms affect our lives? How does technology affect our humanity? Scholars and researchers in the digital humanities (DH) ask questions about how we can use DH to enact social change by making observations of the world around us. This kind of work is often called “transformative DH.”

The idea of transformative DH is an ongoing conversation. As Moya Bailey wrote in 2011, scholars’ experiences and identities affect and inform their theories and practices, which allows them to make worthwhile observations in diverse areas of humanities scholarship. Just as there is strong conflict about how DH itself is defined, there is also conflict regarding whether or not DH needs to be “transformed.” The theme of the 2011 Annual DH Conference held at Stanford was “Big Tent Digital Humanities,” a phrase symbolizing the welcoming nature of the DH field as a space for interdisciplinary scholarship. Still, those on the fringes found themselves unwelcome, or at least unacknowledged.

This conversation around what DH is and what it could be exploded at the Modern Languages Association (MLA) Convention in 2011, which featured multiple digital humanities and digital pedagogy sessions aimed at defining the field and what “counts” as DH. During the convention Stephen Ramsay, in a talk boldly title “Who’s In and Who’s Out,” stated that all digital humanists must code in order to be considered a digital humanist (he later softened “code” to “build”). These comments resulted in ongoing conversations online about gatekeeping in DH, which refer to both what work counts as DH and who counts as a DHer or digital humanist. Moya Bailey also noted certain that scholars whose work focused on race, gender, or queerness and relationships with technology were “doing intersectional digital humanities work in all but name.” This work, however, was not acknowledged as digital humanities.

logo

Website Banner from transformdh.org

To address gatekeeping in the DH community more fully, the group #transformDH was formed in 2011, during this intense period of conversation and attempts at defining. The group self-describes as an “academic guerrilla movement” aimed at re-defining DH as a tool for transformative, social justice scholarship. Their primary objective is to create space in the DH world for projects that push beyond traditional humanities research with digital tools. To achieve this, they encourage and create projects that have the ability to enact social change and bring conversations on race, gender, sexuality, and class into both the academy and the public consciousness. An excellent example of this ideology is the Torn Apart/Separados project, a rapid response DH project completed in response to the United States enacting a “Zero Tolerance Policy” for immigrants attempting to cross the US/Mexico border. In order to visualize the reach and resources of ICE (those enforcing this policy), a cohort of scholars, programmers, and data scientists banded together and published this project in a matter of weeks. Projects such as these demonstrate the potential of DH as a tool for transformative scholarship and to enact social change. The potential becomes dangerously disregarded when we set limits on who counts as a digital humanist and what counts as digital humanities work.

For further, in-depth reading on this topic, check out the articles below.

HathiTrust Research Center Expands Text Mining Corpus

Good news for text and data mining researchers! After years of court cases and policymaking, the entire 16-million-item collection of the HathiTrust Digital Library, including content in-copyright, is available for text and data mining. (Yay!)

Previously, only non-copyrighted, public domain materials were able to be used with HTRC Analytics’ suite of tools. The restriction obviously limited ability to do quality computational research on modern history; most out-of-copyright items are texts created before 1923. With this update, everyone can perform text analysis on the full corpus with different tools. HathiTrust is membership-based, so some restrictions apply to non-member institutions and independent scholars alike (Illinois is a member institution). With the passage of this new policy, only one service, the HTRC Data Capsule (a virtual computing environment), retains members-only access to the full corpus for requesters with an established research need. There are over 140 member institutions, including University of Illinois.

Here’s a quick overview of HTRC’s tools and access permissions (from HTRC’s Documentation).

  • HTRC Algorithms: a set of tools for assembling collections of digitized text from the HathiTrust corpus and performing text analysis on them. Including copyrighted items for ALL USERS.
  • Extracted Features Dataset: dataset allowing non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HathiTrust+Bookworm: a tool for visualizing and analyzing word usage trends in the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HTRC Data Capsule: a secure computing environment for researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items is available ONLY to member-affiliated researchers.

Fair Use to the Rescue!

How is this possible? Through both the Fair Use section of the Copyright Act and HathiTrust’s policy of allowing only non-consumptive research. Fair Use protects use of copyrighted materials for educational, research, and transformative purposes. Non-consumptive research means that researchers can glean information about works without actually being able to read (consume) them. You can see the end result (topic models, word and phrase statistics, etc.), without seeing the entirety of the work for human reading. Allowing computational research only on a corpus protects rights holders, and benefits researchers. A researcher can perform text analysis on thousands of texts without reading them all, which is the basis of computational text analysis anyway! Our Copyright Librarian, Sara Benson, recently discussed how Fair Use factors into HathiTrust’s definition of non-consumptive research.

Ready to use HTRC Analytics for text mining? Check out their Getting Started with HTRC Guide for some simple, guided start-up activities.

For general information about the digital library, see our guide on HathiTrust.