Mapping Native Land

Posted on November 20, 2020 by Mallory E. Untch

Fall break is fast approaching and with it will be Thanksgiving! No matter what your traditions are, we all know that this year’s holiday season will look a little bit different. As we move into the Thanksgiving holiday, I wanted to share a mapping project to give thanks and recognize the native lands we live on.

Native Land is an open-source mapping project that shows the indigenous territories across the world. This interactive map allows you to input your address or click and explore to determine what indigenous land you reside on. Not only that but Native Land shares educational information about these nations, their languages, or treaties. They also include a Teacher’s Guide for various wide age range from children to adults. Users are able to export images of their map, too!

NativeLand.ca Map Interface

Canadian based and indigenous-led, Native Land Digital aims to educate and bring awareness to the complex histories of the land we inhibit. This platform strives to create conversations about indigenous communities between those with native heritage as well as those without. Native Land Digital values the sacredness of land and they use this platform to honor the history of where we reside. Learn more about their mission and impact on their “Why It Matters” page.

Native Land uses MapBox and WordPress to generate their interactive map. MapBox is an open source mapping platform for custom designed maps. Native Land is available as an App for iOS and Android and they have a texting service, as well. You can find more information about how it works here.

If you’d like to learn more about mapping software, the Scholarly Commons has Geographic Information Systems (GIS) software, consultations, and workshops available. The Scholarly Commons webpage on GIS is a great place to get started.

The University of Illinois is a land-grant institution and resides on Kickapoo territory. Where do you stand?

University of Illinois Urbana-Champaign Land Acknowledgement Statement

As a land-grant institution, the University of Illinois at Urbana-Champaign has a responsibility to acknowledge the historical context in which it exists. In order to remind ourselves and our community, we will begin this event with the following statement. We are currently on the lands of the Peoria, Kaskaskia, Piankashaw, Wea, Miami, Mascoutin, Odawa, Sauk, Mesquaki, Kickapoo, Potawatomi, Ojibwe, and Chickasaw Nations. It is necessary for us to acknowledge these Native Nations and for us to work with them as we move forward as an institution. Over the next 150 years, we will be a vibrant community inclusive of all our differences, with Native peoples at the core of our efforts.

Tomorrow! Big Ten Academic Alliance GIS Conference 2020

Posted on November 12, 2020 by Abigail Sewall

Save the date! Tomorrow is the Big Ten Academic Alliance (BTAA) GIS Conference 2020. This event is 100% virtual and free of charge to anyone who wants to engage with the community of GIS specialists and researchers from Big Ten institutions.

The conference kicks off tonight with a GIS Day Trivia Night event at 5:30PM CST! There is a Map Gallery that is open to view from now until November 13th, 2020. The gallery features research that incorporates GIS from Big Ten institutions, so be sure to check it out! There will be lighting talks, presentations, social hours, and a keynote address from Dr. Orhun Aydin, Senior Researcher at Esri, so be sure to check out the full schedule of events and register here.

This event is a great way to network and learn more applications of GIS for research. If you are interested in GIS but don’t know where to start, this event is a great place to get inspired. If you are an experienced GIS researcher, this event is an opportunity to meet colleagues and learn from your peers. Overall this is a great event for anyone interested in GIS and the perfect way to start Geography Awareness Week, which goes from November 15th-21st this year!

Free, Open Source Optical Character Recognition with gImageReader

Posted on November 5, 2020 by Ben Ostermeier

Optical Character Recognition (OCR) is a powerful tool to transform scanned, static images of text into machine-readable data, making it possible to search, edit, and analyze text. If you’re using OCR, chances are you’re working with either ABBYY FineReader or Adobe Acrobat Pro. However, both ABBYY and Acrobat are propriety software with a steep price tag, and while they are both available in the Scholarly Commons, you may want to perform OCR beyond your time at the University of Illinois.

Thankfully, there’s a free, open source alternative for OCR: Tesseract. By itself, Tesseract only works through the command line, which creates a steep learning curve for those unaccustomed to working with a command-line interface (CLI). Additionally, it is fairly difficult to transform a jpg into a searchable PDF with Tesseract.

Thankfully, there are many free, open source programs that provide Tesseract with a graphical user interface (GUI), which not only makes Tesseract much easier to use, some of them come with layout editors that make it possible to create searchable PDFs. You can see the full list of programs on this page.

The program logo for gImageReader

In this post, I will focus on one of these programs, gImageReader, but as you can see on that page, there are many options available on multiple operating systems. I tried all of the Windows-compatible programs and decided that gImageReader was the closest to what I was looking for, a free alternative to ABBYY FineReader that does a pretty good job of letting you correct OCR mistakes and exporting to a searchable PDF.

Installation

gImageReader is available for Windows and Linux. Though they do not include a Mac compatible version in the list of releases, it may be possible to get it to work if you use a package manager for Mac such as Homebrew. I have not tested this though, so I do not make any guarantees about how possible it is to get a working version of gImageReader on Mac.

To install gImageReader on Windows, go to the releases page on Windows. From there, go to the most recent release of the program at the top and click Assets to expand the list of files included with the release. Then select the file that has the .exe extension to download it. You can then run that file to install the program.

Manual

The installation of gImageReader comes with a manual as an HTML file that can be opened by any browser. As of the date of this post, the Fossies software archive is hosting the manual on its website.

Setting OCR Mode

gImageReader has two OCR modes: “Plain Text” and “hOCR, PDF”. Plain Text is the default mode and only recognizes the text itself without any formatting or layout detection. You can export this to a text file or copy and paste it into another program. This may be useful in some cases, but if you want to export a searchable PDF, you will need to use hOCR, PDF mode. hOCR is a standard for formatting OCR text using either XML or HTML and includes layout information, font, OCR result confidence, and other formatting information.

To set the recognition to hOCR, PDF mode, go to the toolbar at the top. It includes a section for “OCR mode” with a dropdown menu. From there, click the dropdown and select hOCR, PDF:

This is the toolbar for gImageReader. You can set OCR mode by using the dropdown that is the third option from the right.

Adding Images, Performing Recognition, and Setting Language

If you have images already scanned, you can add them to be recognized by clicking the Add Images button on the left panel, which looks like a folder. You can then select multiple images if you want to create a multipage PDF. You can always add more images later by clicking that folder button again.

On that left panel, you can also click the Acquire tab button, which allows you to get images directly from a scanner, if the computer you’re using has a scanner connected.

Once you have the images you want, click the Recognize button to recognize the text on the page. Please note that if you have multiple images added, you’ll need to click this button for every page.

If you want to perform recognition on a language other than English, click the arrow next to Recognize. You’ll need to have that language installed, but you can install additional languages by clicking “Manage Languages” in the dropdown appears. If the language is already installed, you can go to the first option listed in the dropdown to select a different language.

Viewing the OCR Result

In this example, I will be performing OCR on this letter by Franklin D. Roosevelt:

Raw scanned image of a typewritten letter signed by Franklin Roosevelt

This 1928 letter from Franklin D. Roosevelt to D. H. Mudge Sr. is courtesy of Madison Historical: The Online Encyclopedia and Digital Archive for Madison County Illinois. https://madison-historical.siue.edu/archive/items/show/819

Once you’ve performed OCR, there will be an output panel on the right. There are a series of buttons above the result. Click the button on the far right to view the text result overlaid on top of the image:

The text result of performing OCR on the FDR letter overlaid on the original scan.

Here is the the text overlaid on an image of the original scan. Note how the scan is slightly transparent now to make the text easier to read.

Correcting OCR

The OCR process did a pretty good job with this example, but it there are a handful of errors. You can click on any of the words of text to show them on the right panel. I will click on the “eclnowledgment” at the end of the letter to correct it. It will then jump to that part of the hOCR “tree” on the right:

hOCR tree in gImageReader, which shows the recognition result of each word in a tree-like structure.

The hOCR tree in gImageReader, which also shows OCR result.

Note in this screenshot I have clicked the second button from the right to show the confidence values, where the higher the number, the higher the confidence Tesseract has with the result. In this case, it is 67% sure that eclnowledgement is correct. Since it obviously isn’t correct, we can type new text by double-clicking on the word in this panel and type “acknowledgement.” You can do this for any errors on the page.

Other correction tips:

If there are any regions that are not text that it is still recognizing, you can right click them on the right and delete them.
You can change the recognized font and its size by going to the bottom area labeled “Properties.” Font size is controlled by the x_fsize field, and x_font has a dropdown where you can select a font.
It is also possible to change the area of the blue word box once it is selected, simply by clicking and dragging the edges and corners.
If there is an area of text that was not captured by the recognition, you can also right click in the hOCR “tree” to add text blocks, paragraphs, textlines, and words to the document. This allows you to draw a box on image and then type what the text says.

Exporting to PDF

Once you are done making OCR corrections, you can export to a searchable PDF. To do so, click the Export button above the hOCR “tree,” which is the third button from the left. Then, select export to PDF. It then gives you several options to set the compression and quality of the PDF image, and once you click OK, it should export the PDF.

Conclusion

Unfortunately, there are some limitations to gImageViewer, as can often be the case with free, open source software. Here are some potential problems you may have with this program:

While you can add new areas to recognize with OCR, there is not a way to change the order of these elements inside the hOCR “tree,” which could be an issue if you are trying to make the reading order clear for accessibility reasons. One potential workaround could be to use the Reading Order options on Adobe Acrobat, which you can read about in this libguide.
You cannot show the areas of the document that are in a recognition box unless you click on a word, unlike ABBYY FineReader which shows all recognition areas at once on the original image.
You cannot perform recognition on all pages at once. You have to click the recognition button individually for each page.
Though there are some image correction options to improve OCR, such as brightness, contrast, and rotation, it does not have as many options as ABBYY FineReader.

gImageViewer is not nearly as user friendly or have all of the features that ABBYY FineReader has, so you will probably want to use ABBYY if it is available to you. However, I find gImageViewer a pretty good program that can meet most general OCR needs.

An interview with Billy Tringali on JAMS and Open Access

Posted on October 30, 2020 by Abigail Sewall

This week I had the opportunity to talk to Billy Tringali. If you don’t know Billy he worked in the Scholarly Commons as a graduate assistant from 2016-2018 and now works as a Law Librarian for Outreach at Emory University. Our conversation this week was about a passion project that he started during his time here at Illinois. Billy is the founding editor-in-chief of a brand new open access journal, The Journal of Anime and Manga Studies (JAMS). The first volume of JAMS came out recently so be sure to go take a look!

How does JAMS fit into a broader scholarly conversation? What gaps in scholarship are you addressing with this journal?

JAMS is currently the only open-access journal solely dedicated to publishing scholarly articles on anime, manga, cosplay, and their fandoms. While there are other journals which publish works about anime, like the incredible Mechademia, they are not open-access. Anime and manga studies is such a diverse field, and there is a lot out there being published. The goal of the Journal of Anime and Manga Studies is to provide a space for academics, students, and independent researchers examining the field of anime, manga, cosplay, and fandom studies to access high-quality research about these topics and share their research with others.

Tell us about your experience working with the Illinois Open Publishing Network (IOPN). What advice do you have for scholars interested in using this resource?

Working with IOPN has been a dream. Such a qualified, helpful, and truly brilliant staff. If you want to use this resource (and why wouldn’t you?!) come prepared to work! JAMS went through a one-year long notes process before being accepted into IOPN, and they don’t publish low-quality work.

Did you always envision the journal as open access? Why or why not?

There was no point in time in which JAMS wasn’t going to be open-access. While I was attending the University of Illinois at Urbana-Champaign, I had more than 14 million items at my fingertips. It was amazing. So much knowledge just a click away. In my coursework I learned how imperative information access is to scholarship, and I could only imagine how difficult it must be for scholars at smaller universities and outside the academe to find peer-reviewed research on this subject. JAMS aims to be part of that solution by publishing work that can be accessed by anyone, anywhere.

What unique challenges do you encounter as a new open access journal that you were not expecting?

The truly worst (and also funniest, looking back) was the professor who doubled-over in laughter when I told them I was trying to start up an open-access journal about anime and manga. But for every person that scoffed at JAMS, there was another who was so interested and excited to see this project succeed. A wonderful lesson to learn as a young scholar was to persevere!

What are the advantages for scholars who publish their work under a creative commons license?

Publishing under a Creative Commons license allows your work to be seen by everyone. It’s as simple as that. Do you want people to see what you’ve made? Then a Creative Commons license is a great choice!

I know Anime and Manga studies is a small area for academic research in the United States. How has this impacted the peer review process?

It’s actually not all that small! There are a wide variety of researchers doing work on anime and manga studies, they just all happen to be spread out among a number of fields! We have peer reviewers from a diverse set of backgrounds – from education, to information science, to fandom studies – who are all so passionate about anime and manga studies. Our peer reviewers do an incredible job strengthening the papers submitted to JAMS, and I am incredibly grateful for their willingness to dedicate time to this journal.

What are your hopes for the future of this publication?

(Combining this the question that was above)
I mention this in my “Welcome from the Editor-in-Chief”, and I think I said it best there:
“I hope the Journal of Anime and Manga Studiescan exist as a space that publishes high-quality scholarship about anime, manga, cosplay, and their fandoms. I hope that JAMS can bring visibility to the deeper meanings, understandings, and cultural significance of anime, manga, cosplay, and their fandoms. I hope that, in making JAMS open-access scholarship about anime and manga can be accessible to everyone, regardless of university affiliation. As Aramata Hiroshi and the Kyoto International Museum of Manga imbued a burning desire in me, I hope that the papers you will read in this journal imbue the same sense in you to do all you can for this fantastic art form.”

Open with Purpose: Open Access Week 2020

Posted on October 22, 2020 by Sarah Appedu

International Open Access Week 2020 is upon us, and the need for equitable access to research has taken on a new sense of urgency. Every year, libraries celebrate Open Access week to bring attention to issues related to scholarly communications. The theme, “Open with Purpose: Taking Action to Build Structural Equity and Inclusion” is intended to get us thinking about the ways our current systems marginalize and exclude.

This year, we celebrate amidst a pandemic that has completely changed how we do things. Usually, immediate access to scholarly research isn’t on many people’s minds. But, research about COVID-19 has made clear the importance of open access to research. This urgency has caused several publishers to open up their content related to COVID-19 and may be accelerating the shift towards open access as the default for scholarly publishing.

Making research about COVID-19 openly available speeds up the research process by allowing more people to access the data they need to find a solution to this crisis. The CDC, UNESCO, and National Institute for Health have all compiled open access information about COVID-19 for research and educational use to assist in this effort.

However, making research available for free is not enough. In her blog post “Opening up the Margins”, April Hathcock writes, “there are so many ways in which open access still reflects the biased systems of the scholarship in which it’s found, even as it can be used to open up scholarship at the margins” (Hathcock, 2016). Open access is still exclusionary if it maintains practices that privilege the publication of white, western, academic voices and centers those perspectives.

open access logo. orange open padlock

It is no secret that COVID-19 disproportionately affects African-Americans. A quick search of “COVID-19 and African-Americans” in Google Scholar reveals tons of studies demonstrating that fact. While the pandemic has made visible the need to address social inequalities that lead to higher vulnerability in black populations, these problems are not new and the solutions cannot be found under a microscope. The people living in these areas are not the ones conducting research, and yet their perspective is invaluable to knowing how the lived experiences of oppression contribute to this tragedy.

Researchers should not treat people as objects of study but as full people whose susceptibility to the disease cannot simply be linked to genetics. To address the pandemic, we must center the experiences of those most vulnerable. With open access advocacy, we must make sure to include voices that aren’t traditionally acknowledged as scholarly and recognize how those experiences inform the research process.

“Open with Purpose” means mindfully and intentionally creating systems that invite people in. The COVID-19 pandemic has highlighted the urgency of this movement, but the social, economic, and political viruses of racism, sexism, classism, etc. had already made this urgency visible to those who are the most marginalized. Open systems need to not only unlock research, but also to question the very structures that keep it closed to certain people in the first place and rebuild them into something better that can more fully address the world’s problems.

Commons Knowledge

Insights from the Scholarly Commons

Category Archives: Digital Scholarship