Mapping Native Land

Fall break is fast approaching and with it will be Thanksgiving! No matter what your traditions are, we all know that this year’s holiday season will look a little bit different. As we move into the Thanksgiving holiday, I wanted to share a mapping project to give thanks and recognize the native lands we live on.

Native Land is an open-source mapping project that shows the indigenous territories across the world. This interactive map allows you to input your address or click and explore to determine what indigenous land you reside on. Not only that but Native Land shares educational information about these nations, their languages, or treaties.  They also include a Teacher’s Guide for various wide age range from children to adults. Users are able to export images of their map, too!

Native Land Map

NativeLand.ca Map Interface

Canadian based and indigenous-led, Native Land Digital aims to educate and bring awareness to the complex histories of the land we inhibit. This platform strives to create conversations about indigenous communities between those with native heritage as well as those without. Native Land Digital values the sacredness of land and they use this platform to honor the history of where we reside. Learn more about their mission and impact on their “Why It Matters” page.

Native Land uses MapBox and WordPress to generate their interactive map. MapBox is an open source mapping platform for custom designed maps. Native Land is available as an App for iOS and Android and they have a texting service, as well. You can find more information about how it works here.

If you’d like to learn more about mapping software, the Scholarly Commons has Geographic Information Systems (GIS) software, consultations, and workshops available. The Scholarly Commons webpage on GIS is a great place to get started.

 The University of Illinois is a land-grant institution and resides on Kickapoo territory. Where do you stand?

University of Illinois Urbana-Champaign Land Acknowledgement Statement

As a land-grant institution, the University of Illinois at Urbana-Champaign has a responsibility to acknowledge the historical context in which it exists. In order to remind ourselves and our community, we will begin this event with the following statement. We are currently on the lands of the Peoria, Kaskaskia, Piankashaw, Wea, Miami, Mascoutin, Odawa, Sauk, Mesquaki, Kickapoo, Potawatomi, Ojibwe, and Chickasaw Nations. It is necessary for us to acknowledge these Native Nations and for us to work with them as we move forward as an institution. Over the next 150 years, we will be a vibrant community inclusive of all our differences, with Native peoples at the core of our efforts.

Tomorrow! Big Ten Academic Alliance GIS Conference 2020

Save the date! Tomorrow is the Big Ten Academic Alliance (BTAA) GIS Conference 2020. This event is 100% virtual and free of charge to anyone who wants to engage with the community of GIS specialists and researchers from Big Ten institutions.

The conference kicks off tonight with a GIS Day Trivia Night event at 5:30PM CST! There is a Map Gallery that is open to view from now until November 13th, 2020. The gallery features research that incorporates GIS from Big Ten institutions, so be sure to check it out! There will be lighting talks, presentations, social hours, and a keynote address from Dr. Orhun Aydin, Senior Researcher at Esri, so be sure to check out the full schedule of events and register here.

This event is a great way to network and learn more applications of GIS for research. If you are interested in GIS but don’t know where to start, this event is a great place to get inspired. If you are an experienced GIS researcher, this event is an opportunity to meet colleagues and learn from your peers. Overall this is a great event for anyone interested in GIS and the perfect way to start Geography Awareness Week, which goes from November 15th-21st this year!

Free, Open Source Optical Character Recognition with gImageReader

Optical Character Recognition (OCR) is a powerful tool to transform scanned, static images of text into machine-readable data, making it possible to search, edit, and analyze text. If you’re using OCR, chances are you’re working with either ABBYY FineReader or Adobe Acrobat Pro. However, both ABBYY and Acrobat are propriety software with a steep price tag, and while they are both available in the Scholarly Commons, you may want to perform OCR beyond your time at the University of Illinois.

Thankfully, there’s a free, open source alternative for OCR: Tesseract. By itself, Tesseract only works through the command line, which creates a steep learning curve for those unaccustomed to working with a command-line interface (CLI). Additionally, it is fairly difficult to transform a jpg into a searchable PDF with Tesseract.

Thankfully, there are many free, open source programs that provide Tesseract with a graphical user interface (GUI), which not only makes Tesseract much easier to use, some of them come with layout editors that make it possible to create searchable PDFs. You can see the full list of programs on this page.

The program logo for gImageReader

The program logo for gImageReader

In this post, I will focus on one of these programs, gImageReader, but as you can see on that page, there are many options available on multiple operating systems. I tried all of the Windows-compatible programs and decided that gImageReader was the closest to what I was looking for, a free alternative to ABBYY FineReader that does a pretty good job of letting you correct OCR mistakes and exporting to a searchable PDF.

Installation

gImageReader is available for Windows and Linux. Though they do not include a Mac compatible version in the list of releases, it may be possible to get it to work if you use a package manager for Mac such as Homebrew. I have not tested this though, so I do not make any guarantees about how possible it is to get a working version of gImageReader on Mac.

To install gImageReader on Windows, go to the releases page on Windows. From there, go to the most recent release of the program at the top and click Assets to expand the list of files included with the release. Then select the file that has the .exe extension to download it. You can then run that file to install the program.

Manual

The installation of gImageReader comes with a manual as an HTML file that can be opened by any browser. As of the date of this post, the Fossies software archive is hosting the manual on its website.

Setting OCR Mode

gImageReader has two OCR modes: “Plain Text” and “hOCR, PDF”. Plain Text is the default mode and only recognizes the text itself without any formatting or layout detection. You can export this to a text file or copy and paste it into another program. This may be useful in some cases, but if you want to export a searchable PDF, you will need to use hOCR, PDF mode. hOCR is a standard for formatting OCR text using either XML or HTML and includes layout information, font, OCR result confidence, and other formatting information.

To set the recognition to hOCR, PDF mode, go to the toolbar at the top. It includes a section for “OCR mode” with a dropdown menu. From there, click the dropdown and select hOCR, PDF:

gImageReader Toolbar

This is the toolbar for gImageReader. You can set OCR mode by using the dropdown that is the third option from the right.

Adding Images, Performing Recognition, and Setting Language

If you have images already scanned, you can add them to be recognized by clicking the Add Images button on the left panel, which looks like a folder. You can then select multiple images if you want to create a multipage PDF. You can always add more images later by clicking that folder button again.

On that left panel, you can also click the Acquire tab button, which allows you to get images directly from a scanner, if the computer you’re using has a scanner connected.

Once you have the images you want, click the Recognize button to recognize the text on the page. Please note that if you have multiple images added, you’ll need to click this button for every page.

If you want to perform recognition on a language other than English, click the arrow next to Recognize. You’ll need to have that language installed, but you can install additional languages by clicking “Manage Languages” in the dropdown appears. If the language is already installed, you can go to the first option listed in the dropdown to select a different language.

Viewing the OCR Result

In this example, I will be performing OCR on this letter by Franklin D. Roosevelt:

Raw scanned image of a typewritten letter signed by Franklin Roosevelt

This 1928 letter from Franklin D. Roosevelt to D. H. Mudge Sr. is courtesy of Madison Historical: The Online Encyclopedia and Digital Archive for Madison County Illinois. https://madison-historical.siue.edu/archive/items/show/819

Once you’ve performed OCR, there will be an output panel on the right. There are a series of buttons above the result. Click the button on the far right to view the text result overlaid on top of the image:

The text result of performing OCR on the FDR letter overlaid on the original scan.

Here is the the text overlaid on an image of the original scan. Note how the scan is slightly transparent now to make the text easier to read.

Correcting OCR

The OCR process did a pretty good job with this example, but it there are a handful of errors. You can click on any of the words of text to show them on the right panel. I will click on the “eclnowledgment” at the end of the letter to correct it. It will then jump to that part of the hOCR “tree” on the right:

hOCR tree in gImageReader, which shows the recognition result of each word in a tree-like structure.

The hOCR tree in gImageReader, which also shows OCR result.

Note in this screenshot I have clicked the second button from the right to show the confidence values, where the higher the number, the higher the confidence Tesseract has with the result. In this case, it is 67% sure that eclnowledgement is correct. Since it obviously isn’t correct, we can type new text by double-clicking on the word in this panel and type “acknowledgement.” You can do this for any errors on the page.

Other correction tips:

  1. If there are any regions that are not text that it is still recognizing, you can right click them on the right and delete them.
  2. You can change the recognized font and its size by going to the bottom area labeled “Properties.” Font size is controlled by the x_fsize field, and x_font has a dropdown where you can select a font.
  3. It is also possible to change the area of the blue word box once it is selected, simply by clicking and dragging the edges and corners.
  4. If there is an area of text that was not captured by the recognition, you can also right click in the hOCR “tree” to add text blocks, paragraphs, textlines, and words to the document. This allows you to draw a box on image and then type what the text says.

Exporting to PDF

Once you are done making OCR corrections, you can export to a searchable PDF. To do so, click the Export button above the hOCR “tree,” which is the third button from the left. Then, select export to PDF. It then gives you several options to set the compression and quality of the PDF image, and once you click OK, it should export the PDF.

Conclusion

Unfortunately, there are some limitations to gImageViewer, as can often be the case with free, open source software. Here are some potential problems you may have with this program:

  1. While you can add new areas to recognize with OCR, there is not a way to change the order of these elements inside the hOCR “tree,” which could be an issue if you are trying to make the reading order clear for accessibility reasons. One potential workaround could be to use the Reading Order options on Adobe Acrobat, which you can read about in this libguide.
  2. You cannot show the areas of the document that are in a recognition box unless you click on a word, unlike ABBYY FineReader which shows all recognition areas at once on the original image.
  3. You cannot perform recognition on all pages at once. You have to click the recognition button individually for each page.
  4. Though there are some image correction options to improve OCR, such as brightness, contrast, and rotation, it does not have as many options as ABBYY FineReader.

gImageViewer is not nearly as user friendly or have all of the features that ABBYY FineReader has, so you will probably want to use ABBYY if it is available to you. However, I find gImageViewer a pretty good program that can meet most general OCR needs.

It Takes a Campus – Episode Two with Harriett Green

Image has the text supporting digital scholarship, it takes a campus with icons of microphone and broadcast symbol

 

 

Resources mentioned:

SPEC Kit No. 357

University of Illinois Library Copyright Guide

 

For the transcript, click on “Continue reading” below.

Continue reading

Illinois Digital Humanities Projects That Will Blow Your Mind

We are living in a moment where we get to discover the exciting possibilities of working, learning, and sharing on digital formats. I have decided to use this as an opportunity to appreciate the ways in which others have already embraced the power digital platforms to enhance their research. In this post I will highlight three amazing digital humanities projects that researchers right here at the University of Illinois contributed to. For each project I will provide a link to their official web page, a brief description of the project, and the name and department of the UIUC researcher who contributed to this project. Prepare to be wowed by the amazing digital work to have come out of our University research community.

Owen Wilson mouthing the word wow

“Prepare to be wowed”- Owen Wilson

Continue reading

Virtual Museums

There is no doubt that technology is changing the way we interact with the world including that of centuries old institutions: Museums!

Historically, museums have been seen as these sacred spaces of knowledge meant to bring together a communities and historically, this also meant a physical space. However, with the heroine that is technology constantly amplifying in our everyday lives, there is no doubt that this would eventually reach museums. While many museums have implemented technology into their education and resources, we are now beginning to see the emergence of what’s called a “virtual museum.”  While the definition of what constitutes these new virtual museums can be precarious, one thing is in common: they exist electronically in cyberspace.

Image result for cyberspace gif

The vast empire of Digital Humanities is allowing space for these virtual museums to cultivate. Information seeking in a digital age is expanding its customs and there is a wide spectrum of resources available—virtual museums being one example. These online organizations are made up of digital exhibitions and exist in their entity on the World Wide Web.

Museums offer an experience. Unlike libraries or archives, people more often utilize museums as a form of tourism and entertainment but within this, they are also centers of research. Museums house information resources that are not accessible to the everyday scholar. Virtual museums are increasing this accessibility.

Here are some examples of virtual museum spaces:

While there are arguments from museum scholars about the legitimacy of these online spaces, I do not think it should discount the ways in which people are using them to share knowledge. While there is still much to develop in virtual museums, the increasing popularity of the digital humanities is granting people an innovative way to interact with art and artifacts that were previously inaccessible. Museums are spaces of exhibition and research — so why limit that to a physical space? It will be interesting to keep an eye on where things may go and question the full potential this convention can contribute to scholarly research!

The Scholarly Commons has many resources that can help you create your own digital hub of information. You can digitize works on one of our high resolution scanners, create these into searchable documents with OCR software, and publish online with tools such as Omeka, a digital publishing software.

You can also consult with our expert in Digital Humanities, Spencer Keralis, to find the right tools for your project. Check out last week’s blog post to learn more about him.

Maybe one day all museums will be available virtually? What are your thoughts?

Meet Spencer Keralis, Digital Humanities Librarian

Spencer Keralis teaches a class.

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features one of the newest members of our team, Spencer Keralis, Digital Humanities Librarian.


What is your background and work experience?

I have a Ph.D. in English and American Literature from New York University. I started working in libraries in 2011 as a Council on Library and Information Resources (CLIR) Fellow with the University of North Texas Libraries, doing research on data management policy and practice. This turned into a position as a Research Associate Professor working to catalyze digital scholarship on campus, which led to the development of Digital Frontiers, which is now an independent non-profit corporation. I serve as the Executive Director of the organization and help organize the annual conference. I have previous experience working as a project manager in telecom and non-profits. I’ve also taught in English and Communications at the university level since 2006.

What led you to this field?

My CLIR Fellowship really sparked the career change from English to libraries, but I had been considering libraries as an alternate career path prior to that. My doctoral research was heavily archives-based, and I initially thought I’d pursue something in rare books or special collections. My interest in digital scholarship evolved later.

What is your research agenda?

My current project explores how the HIV-positive body is reproduced and represented in ephemera and popular culture in the visual culture of the early years of the AIDS epidemic. In American popular culture, representations of the HIV-positive body have largely been defined by Therese Frare’s iconic 1990 photograph of gay activist David Kirby on his deathbed in an Ohio hospital, which was later used for a United Colors of Benetton ad. Against this image, and other representations which medicalized or stigmatized HIV-positive people, people living with AIDS and their allies worked to remediate the HIV-positive body in ephemera including safe sex pamphlets, zines, comics, and propaganda. In my most recent work, I’m considering the reclamation of the erotic body in zines and comics, and how the HIV-positive body is imagined as an object of desire differently in these underground publications than they are in mainstream queer comics representing safer sex. I also consider the preservation and digitization of zines and other ephemera as a form of remediation that requires a specific ethical positioning in relation to these materials and the community that produced them, engaging with the Zine Librarians’ Code of Conduct, folksonomies and other metadata schema, and collection and digitization policies regarding zines from major research libraries. This research feels very timely and urgent given rising rates of new infection among young people, but it’s also really fun because the materials are so eclectic and often provocative. You can check out a bit of this research on the UNT Comics Studies blog.

 Do you have any favorite work-related duties?

I love working with students and helping them develop their research questions. Too often students (and sometimes faculty, let’s be honest) come to me and ask “What tools should I learn?” I always respond by asking them what their research question is. Not every research question is going to be amenable to digital tools, and not every tool works for every research question. But having a conversation about how digital methods can potentially enrich a student’s research is always rewarding, and I always learn so much from these conversations.

 What are some of your favorite underutilized resources that you would recommend to researchers?

I think comics and graphic novels are generally underappreciated in both pedagogy and research. There are comics on every topic, and historical comics go back much further than most people realize. I think the intersection of digital scholarship with comics studies has a lot of potential, and a lot of challenges that have yet to be met – the technical challenge of working with images is significant, and there has yet to be significant progress on what digital scholarship in comics might look like. I also think comics belong more in classes – all sorts of classes, there are comics on every topic, from math and physics, to art and literature – than they are now because they reach students differently than other kinds of texts.

 If you could recommend one book or resource to beginning researchers in your field, what would you recommend?

I’m kind of obsessed with Liz Losh and Jacque Wernimont’s edited collection Bodies of Information: Intersectional Feminism and Digital Humanities because it’s such an important intervention in the field. I’d rather someone new to DH start there than with some earlier, canonical works because it foregrounds alternative perspectives and methodologies without centering a white, male perspective. Better, I think, to start from the margins and trouble some of the traditional narratives in the discipline right out the gate. I’m way more interested in disrupting monolithic or hegemonic approaches to DH than I am in gatekeeping, and Liz and Jacque’s collection does a great job of constructively disrupting the field.

Digital Humanities Maps

Historically, maps were 2D, printed, sometimes wildly inaccurate representations of space. Today, maps can still be wildly inaccurate, but digital tools provide a way to apply more data to a spatial representation. However, displaying data on a map is not a completely new idea. W.E.B. DuBois’ 1899 sociological research study “The Philadelphia Negro” was one of the first to present data in a visual format, both in map form and other forms.

map of the seventh ward of philadelphia, each household is drawn on the map and represented by a color corresponding to class standing

The colors on the map indicate the class standing of each household.

Digital maps can add an interesting, spatial dimension to your humanities or social science research. People respond well to visuals, and maps provide a way to display a visual that corresponds to real-life space. Today we’ll highlight some DH mapping projects, and point to some resources to create your own map!

(If you are interested in DH maps, attend our Mapping in the Humanities workshop next week!)

Sources of Digital Maps

Some sources of historical maps, like the ones below, openly provide access to georeferenced maps. “Georeferencing,” also called “georectifying,” is the process of aligning historical maps to precisely match a modern-day map. Completing this process allows historical maps to be used in digital tools, like GIS software. Think of it like taking an image of a map, and assigning latitude/longitude pairs to different points on the map that correspond to modern maps. Currently, manually matching the points up is the only way to do this!

A map from a book about Chicago placed over a modern map of Chicago.

A map of Chicago from 1891 overlaid on a modern map of the Chicago area.

David Rumsey Map Collection
The David Rumsey Map Collection is a mainstay in the world of historical maps. As of the time of writing, 68% of their total map collection has been georeferenced. There are other ways to interact with the collection, such as searching on a map for specific locations, or even viewing the maps in Second Life!

NYPL Map Warper
The New York Public Library’s Map Warper offers a large collection of historical maps georeferenced by users. Most maps have been georeferenced at this point, but users can still help out!

OpenStreetMap
OpenStreetMap is the open-source, non-proprietary version of Google Maps. Many tools used in DH, like Leaflet and Omeka’s Neatline, use OpenStreetMap’s data and applications to create maps.

Digital Mapping Humanities Projects

Get inspired! Here are some DH mapping projects to help you think about applying mapping to your own research.

Maps provide the perfect medium for DH projects focused on social justice and decolonization. Native-land.ca is a fairly recent example of this application. The project, started as a non-academic, private project in 2015, has now transformed into a not-for-profit organization. Native-land.ca attempts to visualize land belonging to native nations in the Americas and Australia, but notably not following the official or legal boundaries. The project also provides a teacher’s guide to assist developing a curriculum around colonization in schools.

map of florida with data overlay indicating which native tribes have rights to the land

The state of Florida occupies the territory of multiple native tribes, notably those of the Seminole.

Other projects use digital tools that show a map in conjunction with another storytelling tool, like a timeline or a narrative. The levantCarta/Beirut project uses a timeline to filter which images show up on the connected map of Beirut. We can easily see the spatial representation of a place in a temporal context. A fairly easy tool for this kind of digital storytelling is TimeMapper.

For a more meta example, check out this map of digital humanities labs by Urszula Pawlicka-Deger. Of course these DH centers do projects other than mapping, but even the study of DH can make use of digital mapping!

If you’re interested in adding maps to your humanities research, check out our workshop this semester on humanities mapping. There are also great tutorials for more advanced mapping on The Programming Historian.

And as always, feel free to reach out to the Scholarly Commons (sc@library.illinois.edu) to get started on your digital humanities project.

Transformation in Digital Humanities

The opinions presented in this piece are solely the author’s and referenced authors. This is meant to serve as a synthesis of arguments made in DH regarding transformation.

How do data and algorithms affect our lives? How does technology affect our humanity? Scholars and researchers in the digital humanities (DH) ask questions about how we can use DH to enact social change by making observations of the world around us. This kind of work is often called “transformative DH.”

The idea of transformative DH is an ongoing conversation. As Moya Bailey wrote in 2011, scholars’ experiences and identities affect and inform their theories and practices, which allows them to make worthwhile observations in diverse areas of humanities scholarship. Just as there is strong conflict about how DH itself is defined, there is also conflict regarding whether or not DH needs to be “transformed.” The theme of the 2011 Annual DH Conference held at Stanford was “Big Tent Digital Humanities,” a phrase symbolizing the welcoming nature of the DH field as a space for interdisciplinary scholarship. Still, those on the fringes found themselves unwelcome, or at least unacknowledged.

This conversation around what DH is and what it could be exploded at the Modern Languages Association (MLA) Convention in 2011, which featured multiple digital humanities and digital pedagogy sessions aimed at defining the field and what “counts” as DH. During the convention Stephen Ramsay, in a talk boldly title “Who’s In and Who’s Out,” stated that all digital humanists must code in order to be considered a digital humanist (he later softened “code” to “build”). These comments resulted in ongoing conversations online about gatekeeping in DH, which refer to both what work counts as DH and who counts as a DHer or digital humanist. Moya Bailey also noted certain that scholars whose work focused on race, gender, or queerness and relationships with technology were “doing intersectional digital humanities work in all but name.” This work, however, was not acknowledged as digital humanities.

logo

Website Banner from transformdh.org

To address gatekeeping in the DH community more fully, the group #transformDH was formed in 2011, during this intense period of conversation and attempts at defining. The group self-describes as an “academic guerrilla movement” aimed at re-defining DH as a tool for transformative, social justice scholarship. Their primary objective is to create space in the DH world for projects that push beyond traditional humanities research with digital tools. To achieve this, they encourage and create projects that have the ability to enact social change and bring conversations on race, gender, sexuality, and class into both the academy and the public consciousness. An excellent example of this ideology is the Torn Apart/Separados project, a rapid response DH project completed in response to the United States enacting a “Zero Tolerance Policy” for immigrants attempting to cross the US/Mexico border. In order to visualize the reach and resources of ICE (those enforcing this policy), a cohort of scholars, programmers, and data scientists banded together and published this project in a matter of weeks. Projects such as these demonstrate the potential of DH as a tool for transformative scholarship and to enact social change. The potential becomes dangerously disregarded when we set limits on who counts as a digital humanist and what counts as digital humanities work.

For further, in-depth reading on this topic, check out the articles below.

How We’re Celebrating the Sweet Public Domain

This is a guest blog by the amazing Kaylen Dwyer, a GA in Scholarly and Communication Publishing

Collage of the Honey Bunch series

As William Tringali mentioned last week, 2019 marks an exciting shift in copyright law with hundreds of thousands of works entering the public domain every January 1st for the next eighteen years. We are setting our clocks back to the year of 1923—to the birth of the Harlem Renaissance with magazines like The Crisis, to first-wave feminists like Edith Wharton, Virginia Woolf, and Dorothy L. Sayers, back to the inter-war period.

Copyright librarian Sara Benson has been laying the groundwork to bring in the New Year and celebrate the wealth of knowledge now publicly available for quite some time, leading up to a digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, and the Re-Mix It! Competition to be held this spring.

A collaborative effort between Benson, graduate assistants, and several scholarly contributors, The Sweet Public Domain celebrates creative reuse and copyright law. Last year, GA Paige Kuester spent time scouring the Rare Book and Manuscript Library in search of something that had never been digitized before, something at risk of being forgotten forever, not because it is unworthy of attention, but because it has been captive to copyright for so long.

We found just the thing—the beloved Honey Bunch series, a best-selling girls’ series by the Stratemeyer Syndicate. The syndicate become known for its publication of Nancy Drew, the Hardy Boys, the Bobbsey Twins, and many others, but in 1923 they kicked off the adventures of Honey Bunch with Just a Little Girl, Her First Visit to the City, and Her First Days on the Farm.

Through the digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, you can explore all three books, introduced by Deidre Johnson (Edward Stratemeyer and the Stratemeyer Syndicate, 1993) and LuElla D’Amico (Girls Series Fiction and American Popular Culture, 2017). To hear more about copyright and creative reuse, you can find essays by Sara Benson, our copyright librarian, and Kirby Ferguson, filmmaker and producer of Everything is a Remix.

If you are a student at the University of Illinois at Urbana-Champaign, you can engage with the public domain by making new and innovative work out of something old and win up to $500 for your creation. Check out the Re-Mix It! Competition page for contest details and be sure to check out our physical exhibit in the Marshall Gallery (Main Library, first floor east entrance) for ideas.

Logo for the Remix It competition