Meet our Graduate Assistants: Ben Ostermeier

What is your background education and work experience?

I graduated from Southern Illinois University Edwardsville with a Bachelor of Arts in History, with a minor in Computer Science. I was also the first SIUE student to receive an additional minor in Digital Humanities and Social Sciences. In undergrad I worked on a variety of digital humanities projects with the IRIS Center for the digital humanities, and after graduating I was hired as the technician for the IRIS Center. In that role, I was responsible for supporting the technical needs of digital humanities projects affiliated with the IRIS Center and provided guidance to professors and students starting their own digital scholarship projects.

What led you to your field?

I have been drawn to applied humanities, particularly history, since high school, and I have long enjoyed tinkering with software and making information available online. When I was young this usually manifested in reading and writing information on fan wikis. More recently, I have particularly enjoyed working on digital archives that focus on local community history, such as the SIUE Madison Historical project at madison-historical.siue.edu.

What are your favorite projects you’ve worked on?

While working for the Scholarly Commons, I have had the opportunity to work with my fellow graduate assistant Mallory Untch to publish our new podcast, It Takes a Campus, on iTunes and other popular podcast libraries. Recently, I recorded and published an episode with Dr. Ted Underwood. Mallory and I also created an interactive timeline showcasing the history of the Scholarly Commons for the unit’s tenth anniversary last fall.

What are some of your favorite underutilized Scholarly Commons resources that you would recommend?

We offer consultations to patrons looking for in-depth assistance with their digital scholarship. You can request a consultation through our online form!

When you graduate, what would your ideal job position look like?

I would love to work as a Digital Archivist in some form, responsible for ensuring the long term preservation of digital artifacts, as well as the best way to make these objects accessible to users. It is especially important to me that these digital spaces relate to and are accessible to the people and cultures represented in the items, so I hope I am able to make these sorts of community connections wherever I end up working.

Introductions: What is Digital Scholarship, anyways?

This is the beginning of a new series where we introduce you to the various topics that we cover in the Scholarly Commons. Maybe you’re new to the field or you’re just to the point where you’re just too afraid to ask… Fear not! We are here to take it back to the basics!

What is digital scholarship, anyways?

Digital scholarship is an all-encompassing term and it can be used very broadly. Digital scholarship refers to the use of digital tools, methods, evidence, or any other digital materials to complete a scholarly project. So, if you are using digital means to construct, analyze, or present your research, you’re doing digital scholarship!

It seems really basic to say that digital scholarship is any project that uses digital means because nowadays, isn’t that every project? Yes and No. We use the term digital quite liberally…If you used Microsoft Word to just write your essay about a lab you did during class – that is not digital scholarship however if you used specialized software to analyze the results from a survey you used to gather data then you wrote about it in an essay that you then typed in Microsoft Word, then that is digital scholarship! If you then wanted to get this essay published and hosted in an online repository so that other researchers can find your essay, then that is digital scholarship too!

Many higher education institutions have digital scholarship centers at their campus that focus on providing specialized support for these types of projects. The Scholarly Commons is a digital scholarship space in the University Main Library! Digital scholarship centers are often pushing for new and innovative means of discovery. They have access to specialized software and hardware and provide a space for collaboration and consultations with subject experts that can help you achieve your project goals.

At the Scholarly Commons, we support a wide array of topics that support digital and data-driven scholarship that this series will cover in the future. We have established partners throughout the library and across the wider University campus to support students, staff, and faculty in their digital scholarship endeavors.

Here is a list of the digital scholarship service points we support:

You can find a list of all the software the Scholarly Commons has to support digital scholarship here and a list of the Scholarly Commons hardware here. If you’re interested in learning more about the foundations of digital scholarship follow along to our Introductions series as we got back to the basics.

As always, if you’re interested in learning more about digital scholarship and how to  support your own projects you can fill out a consultation request form, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Simple NetInt: A New Data Visualization Tool from Illinois Assistant Professor, Juan Salamanca

Juan Salamanca Ph.D, Assistant Professor in the School of Art and Design at the University of Illinois Urbana-Champaign recently created a new data visualization tool called Simple NetInt. Though developed from a tool he created a few years ago, this tool brings entirely new opportunities to digital scholarship! This week we had the chance to talk to Juan about this new tool in data visualization. Here’s what he said…

Simple NetInt is a JavaScript version of NetInt, a Java-based node-link visualization prototype designed to support the visual discovery of patterns across large dataset by displaying disjoint clusters of vertices that could be filtered, zoomed in or drilled down interactively. The visualization strategy used in Simple NetInt is to place clustered nodes in independent 3D spaces and draw links between nodes across multiple spaces. The result is a simple graphic user interface that enables visual depth as an intuitive dimension for data exploration.

Simple NetInt InterfaceCheck out the Simple NetInt tool here!

In collaboration with Professor Eric Benson, Salamanca tested a prototype of Simple NetInt with a dataset about academic publications, episodes, and story locations of the Sci-Fi TV series Firefly. The tool shows a network of research relationships between these three sets of entities similar to a citation map but on a timeline following the episodes chronology.

What inspired you to create this new tool?

This tool is an extension of a prototype I built five years ago for the visualization of financial transactions between bank clients. It is a software to visualize networks based on the representation of entities and their relationships and nodes and edges. This new version is used for the visualization of a totally different dataset:  scholarly work published in papers, episodes of a TV Series, and the narrative of the series itself. So, the network representation portrays relationships between journal articles, episode scripts, and fictional characters. I am also using it to design a large mural for the Siebel Center for Design.

What are your hopes for the future use of this project?

The final goal of this project is to develop an augmented reality visualization of networks to be used in the field of digital humanities. This proof of concept shows that scholars in the humanities come across datasets with different dimensional systems that might not be compatible across them. For instance, a timeline of scholarly publications may encompass 10 or 15 years, but the content of what is been discussed in that body of work may encompass centuries of history. Therefore, these two different temporal dimensions need to be represented in such a way that helps scholars in their interpretations. I believe that an immersive visualization may drive new questions for researchers or convey new findings to the public.

What were the major challenges that came with creating this tool?

The major challenge was to find a way to represent three different systems of coordinates in the same space. The tool has a universal space that contains relative subspaces for each dataset loaded. So, the nodes instantiated from each dataset are positioned in their own coordinate system, which could be a timeline, a position relative to a map, or just clusters by proximities. But the edges that connect nodes jump from one coordinate system to the other. This creates the idea of a system of nested spaces that works well with few subspaces, but I am still figuring out what is the most intuitive way to navigate larger multidimensional spaces.

What are your own research interests and how does this project support those?

My research focuses on understanding how designed artifacts affect the viscosity of social action. What I do is to investigate how the design of artifacts facilitates or hinders the cooperation of collaboration between people. I use visual analytics methods to conduct my research so the analysis of networks is an essential tool. I have built several custom-made tools for the observation of the interaction between people and things, and this is one of them.

If you would like to learn more about Simple NetInt you can find contact information for Professor Juan Salamanca here and more information on his research!

If you’re interested in learning more about data visualizations for your own projects, check out our guide on visualizing your data, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

The Art Institute of Chicago Launches Public API

Application Programming Interfaces, or APIs, are a major feature of the web today. Almost every major website has one, including Google Maps, Facebook, Twitter, Spotify, Wikipedia, and Netflix. If you Google the name of your favorite website and API, chances are you will find an API for it.

Last week, another institution joined the millions of public APIs available today: The Art Institute of Chicago. While they are not the first museum to release a public API, their blog article announcing the release of the API states that it holds the largest amount of data released to the public through an API from a museum. It is also the first museum API to hold all of their public data in one location, including data about their art collection, every exhibition ever held by the Institute since 1879, blog articles, full publication texts, and more than 1,000 gift shop products.

But what exactly is an API, and why should we be excited that we can now interact with the Art Institute of Chicago in this way? An API is basically a particular way to interact with a software application, usually a website. Normally when you visit a website in a browser, such as wikipedia.org, the browser requests an HTML document in order to render the images, fonts, text, and many other bits of data related to the appearance of the web page. This is a useful way to interact as a human consuming information, but if you wanted to perform some sort of data analysis on the data it would be much more difficult to do it this way. For example, if you wanted to answer even a simple question like “Which US president has the longest Wikipedia article?” it would be time consuming to do it the traditional way of viewing webpages.

Instead, an API allows you or other programs to request just the data from a web server. Using a programming language, you could use the Wikipedia API to request the text of each US President’s Wikipedia page and then simply calculate which text is the longest. API responses usually come in the form of data objects with various attributes. The format of these objects vary between websites.

“A Sunday on La Grande Jatte” by Georges Seurat, the data for which is now publicly available from the Art Institute of Chicago’s API.

The same is now true for the vast collections of the Art Institute of Chicago. As a human user you can view the web page for the work “A Sunday on La Grande Jatte” by Georges Seurat at this URL:

 https://www.artic.edu/artworks/27992/a-sunday-on-la-grande-jatte-1884

If you wanted to get the data for this work through an API to do data analysis though, you could make an API request at this URL:

https://api.artic.edu/api/v1/artworks/27992

Notice how both URLs contain “27992”, which is the unique ID for that artwork.

If you open that link in a browser, you will get a bunch of formatted text (if you’re interested, it’s formatted as JSON, a format that is designed to be manipulated by a programming language). If you were to request this data in a program, you could then perform all sorts of analysis on it.

To get an idea of what’s possible with an art museum API, check out this FiveThirtyEight article about the collections of New York’s Metropolitan Museum of Art, which includes charts of which countries are most represented at the Met and which artistic mediums are most popular.

It is possible now to ask the same questions about the Art Institute of Chicago’s collections, along with many others, such as “what is the average size of an impressionist painting?” or “which years was surrealist art most popular?” The possibilities are endless.

To get started with their API, check out their documentation. If you’re familiar with Python and possibly python’s data analysis library pandas, you could check out this article about using APIs in python to perform data analysis to start playing with the Art Institute’s API. You may also want to look at our LibGuide about qualitative data analysis to see what you could do with the data once you have it.

Mapping Native Land

Fall break is fast approaching and with it will be Thanksgiving! No matter what your traditions are, we all know that this year’s holiday season will look a little bit different. As we move into the Thanksgiving holiday, I wanted to share a mapping project to give thanks and recognize the native lands we live on.

Native Land is an open-source mapping project that shows the indigenous territories across the world. This interactive map allows you to input your address or click and explore to determine what indigenous land you reside on. Not only that but Native Land shares educational information about these nations, their languages, or treaties.  They also include a Teacher’s Guide for various wide age range from children to adults. Users are able to export images of their map, too!

Native Land Map

NativeLand.ca Map Interface

Canadian based and indigenous-led, Native Land Digital aims to educate and bring awareness to the complex histories of the land we inhibit. This platform strives to create conversations about indigenous communities between those with native heritage as well as those without. Native Land Digital values the sacredness of land and they use this platform to honor the history of where we reside. Learn more about their mission and impact on their “Why It Matters” page.

Native Land uses MapBox and WordPress to generate their interactive map. MapBox is an open source mapping platform for custom designed maps. Native Land is available as an App for iOS and Android and they have a texting service, as well. You can find more information about how it works here.

If you’d like to learn more about mapping software, the Scholarly Commons has Geographic Information Systems (GIS) software, consultations, and workshops available. The Scholarly Commons webpage on GIS is a great place to get started.

 The University of Illinois is a land-grant institution and resides on Kickapoo territory. Where do you stand?

University of Illinois Urbana-Champaign Land Acknowledgement Statement

As a land-grant institution, the University of Illinois at Urbana-Champaign has a responsibility to acknowledge the historical context in which it exists. In order to remind ourselves and our community, we will begin this event with the following statement. We are currently on the lands of the Peoria, Kaskaskia, Piankashaw, Wea, Miami, Mascoutin, Odawa, Sauk, Mesquaki, Kickapoo, Potawatomi, Ojibwe, and Chickasaw Nations. It is necessary for us to acknowledge these Native Nations and for us to work with them as we move forward as an institution. Over the next 150 years, we will be a vibrant community inclusive of all our differences, with Native peoples at the core of our efforts.

Tomorrow! Big Ten Academic Alliance GIS Conference 2020

Save the date! Tomorrow is the Big Ten Academic Alliance (BTAA) GIS Conference 2020. This event is 100% virtual and free of charge to anyone who wants to engage with the community of GIS specialists and researchers from Big Ten institutions.

The conference kicks off tonight with a GIS Day Trivia Night event at 5:30PM CST! There is a Map Gallery that is open to view from now until November 13th, 2020. The gallery features research that incorporates GIS from Big Ten institutions, so be sure to check it out! There will be lighting talks, presentations, social hours, and a keynote address from Dr. Orhun Aydin, Senior Researcher at Esri, so be sure to check out the full schedule of events and register here.

This event is a great way to network and learn more applications of GIS for research. If you are interested in GIS but don’t know where to start, this event is a great place to get inspired. If you are an experienced GIS researcher, this event is an opportunity to meet colleagues and learn from your peers. Overall this is a great event for anyone interested in GIS and the perfect way to start Geography Awareness Week, which goes from November 15th-21st this year!

Free, Open Source Optical Character Recognition with gImageReader

Optical Character Recognition (OCR) is a powerful tool to transform scanned, static images of text into machine-readable data, making it possible to search, edit, and analyze text. If you’re using OCR, chances are you’re working with either ABBYY FineReader or Adobe Acrobat Pro. However, both ABBYY and Acrobat are propriety software with a steep price tag, and while they are both available in the Scholarly Commons, you may want to perform OCR beyond your time at the University of Illinois.

Thankfully, there’s a free, open source alternative for OCR: Tesseract. By itself, Tesseract only works through the command line, which creates a steep learning curve for those unaccustomed to working with a command-line interface (CLI). Additionally, it is fairly difficult to transform a jpg into a searchable PDF with Tesseract.

Thankfully, there are many free, open source programs that provide Tesseract with a graphical user interface (GUI), which not only makes Tesseract much easier to use, some of them come with layout editors that make it possible to create searchable PDFs. You can see the full list of programs on this page.

The program logo for gImageReader

The program logo for gImageReader

In this post, I will focus on one of these programs, gImageReader, but as you can see on that page, there are many options available on multiple operating systems. I tried all of the Windows-compatible programs and decided that gImageReader was the closest to what I was looking for, a free alternative to ABBYY FineReader that does a pretty good job of letting you correct OCR mistakes and exporting to a searchable PDF.

Installation

gImageReader is available for Windows and Linux. Though they do not include a Mac compatible version in the list of releases, it may be possible to get it to work if you use a package manager for Mac such as Homebrew. I have not tested this though, so I do not make any guarantees about how possible it is to get a working version of gImageReader on Mac.

To install gImageReader on Windows, go to the releases page on Windows. From there, go to the most recent release of the program at the top and click Assets to expand the list of files included with the release. Then select the file that has the .exe extension to download it. You can then run that file to install the program.

Manual

The installation of gImageReader comes with a manual as an HTML file that can be opened by any browser. As of the date of this post, the Fossies software archive is hosting the manual on its website.

Setting OCR Mode

gImageReader has two OCR modes: “Plain Text” and “hOCR, PDF”. Plain Text is the default mode and only recognizes the text itself without any formatting or layout detection. You can export this to a text file or copy and paste it into another program. This may be useful in some cases, but if you want to export a searchable PDF, you will need to use hOCR, PDF mode. hOCR is a standard for formatting OCR text using either XML or HTML and includes layout information, font, OCR result confidence, and other formatting information.

To set the recognition to hOCR, PDF mode, go to the toolbar at the top. It includes a section for “OCR mode” with a dropdown menu. From there, click the dropdown and select hOCR, PDF:

gImageReader Toolbar

This is the toolbar for gImageReader. You can set OCR mode by using the dropdown that is the third option from the right.

Adding Images, Performing Recognition, and Setting Language

If you have images already scanned, you can add them to be recognized by clicking the Add Images button on the left panel, which looks like a folder. You can then select multiple images if you want to create a multipage PDF. You can always add more images later by clicking that folder button again.

On that left panel, you can also click the Acquire tab button, which allows you to get images directly from a scanner, if the computer you’re using has a scanner connected.

Once you have the images you want, click the Recognize button to recognize the text on the page. Please note that if you have multiple images added, you’ll need to click this button for every page.

If you want to perform recognition on a language other than English, click the arrow next to Recognize. You’ll need to have that language installed, but you can install additional languages by clicking “Manage Languages” in the dropdown appears. If the language is already installed, you can go to the first option listed in the dropdown to select a different language.

Viewing the OCR Result

In this example, I will be performing OCR on this letter by Franklin D. Roosevelt:

Raw scanned image of a typewritten letter signed by Franklin Roosevelt

This 1928 letter from Franklin D. Roosevelt to D. H. Mudge Sr. is courtesy of Madison Historical: The Online Encyclopedia and Digital Archive for Madison County Illinois. https://madison-historical.siue.edu/archive/items/show/819

Once you’ve performed OCR, there will be an output panel on the right. There are a series of buttons above the result. Click the button on the far right to view the text result overlaid on top of the image:

The text result of performing OCR on the FDR letter overlaid on the original scan.

Here is the the text overlaid on an image of the original scan. Note how the scan is slightly transparent now to make the text easier to read.

Correcting OCR

The OCR process did a pretty good job with this example, but it there are a handful of errors. You can click on any of the words of text to show them on the right panel. I will click on the “eclnowledgment” at the end of the letter to correct it. It will then jump to that part of the hOCR “tree” on the right:

hOCR tree in gImageReader, which shows the recognition result of each word in a tree-like structure.

The hOCR tree in gImageReader, which also shows OCR result.

Note in this screenshot I have clicked the second button from the right to show the confidence values, where the higher the number, the higher the confidence Tesseract has with the result. In this case, it is 67% sure that eclnowledgement is correct. Since it obviously isn’t correct, we can type new text by double-clicking on the word in this panel and type “acknowledgement.” You can do this for any errors on the page.

Other correction tips:

  1. If there are any regions that are not text that it is still recognizing, you can right click them on the right and delete them.
  2. You can change the recognized font and its size by going to the bottom area labeled “Properties.” Font size is controlled by the x_fsize field, and x_font has a dropdown where you can select a font.
  3. It is also possible to change the area of the blue word box once it is selected, simply by clicking and dragging the edges and corners.
  4. If there is an area of text that was not captured by the recognition, you can also right click in the hOCR “tree” to add text blocks, paragraphs, textlines, and words to the document. This allows you to draw a box on image and then type what the text says.

Exporting to PDF

Once you are done making OCR corrections, you can export to a searchable PDF. To do so, click the Export button above the hOCR “tree,” which is the third button from the left. Then, select export to PDF. It then gives you several options to set the compression and quality of the PDF image, and once you click OK, it should export the PDF.

Conclusion

Unfortunately, there are some limitations to gImageViewer, as can often be the case with free, open source software. Here are some potential problems you may have with this program:

  1. While you can add new areas to recognize with OCR, there is not a way to change the order of these elements inside the hOCR “tree,” which could be an issue if you are trying to make the reading order clear for accessibility reasons. One potential workaround could be to use the Reading Order options on Adobe Acrobat, which you can read about in this libguide.
  2. You cannot show the areas of the document that are in a recognition box unless you click on a word, unlike ABBYY FineReader which shows all recognition areas at once on the original image.
  3. You cannot perform recognition on all pages at once. You have to click the recognition button individually for each page.
  4. Though there are some image correction options to improve OCR, such as brightness, contrast, and rotation, it does not have as many options as ABBYY FineReader.

gImageViewer is not nearly as user friendly or have all of the features that ABBYY FineReader has, so you will probably want to use ABBYY if it is available to you. However, I find gImageViewer a pretty good program that can meet most general OCR needs.

It Takes a Campus – Episode Two with Harriett Green

Image has the text supporting digital scholarship, it takes a campus with icons of microphone and broadcast symbol

 

 

Resources mentioned:

SPEC Kit No. 357

University of Illinois Library Copyright Guide

 

For the transcript, click on “Continue reading” below.

Continue reading

Illinois Digital Humanities Projects That Will Blow Your Mind

We are living in a moment where we get to discover the exciting possibilities of working, learning, and sharing on digital formats. I have decided to use this as an opportunity to appreciate the ways in which others have already embraced the power digital platforms to enhance their research. In this post I will highlight three amazing digital humanities projects that researchers right here at the University of Illinois contributed to. For each project I will provide a link to their official web page, a brief description of the project, and the name and department of the UIUC researcher who contributed to this project. Prepare to be wowed by the amazing digital work to have come out of our University research community.

Owen Wilson mouthing the word wow

“Prepare to be wowed”- Owen Wilson

Continue reading

Virtual Museums

There is no doubt that technology is changing the way we interact with the world including that of centuries old institutions: Museums!

Historically, museums have been seen as these sacred spaces of knowledge meant to bring together a communities and historically, this also meant a physical space. However, with the heroine that is technology constantly amplifying in our everyday lives, there is no doubt that this would eventually reach museums. While many museums have implemented technology into their education and resources, we are now beginning to see the emergence of what’s called a “virtual museum.”  While the definition of what constitutes these new virtual museums can be precarious, one thing is in common: they exist electronically in cyberspace.

Image result for cyberspace gif

The vast empire of Digital Humanities is allowing space for these virtual museums to cultivate. Information seeking in a digital age is expanding its customs and there is a wide spectrum of resources available—virtual museums being one example. These online organizations are made up of digital exhibitions and exist in their entity on the World Wide Web.

Museums offer an experience. Unlike libraries or archives, people more often utilize museums as a form of tourism and entertainment but within this, they are also centers of research. Museums house information resources that are not accessible to the everyday scholar. Virtual museums are increasing this accessibility.

Here are some examples of virtual museum spaces:

While there are arguments from museum scholars about the legitimacy of these online spaces, I do not think it should discount the ways in which people are using them to share knowledge. While there is still much to develop in virtual museums, the increasing popularity of the digital humanities is granting people an innovative way to interact with art and artifacts that were previously inaccessible. Museums are spaces of exhibition and research — so why limit that to a physical space? It will be interesting to keep an eye on where things may go and question the full potential this convention can contribute to scholarly research!

The Scholarly Commons has many resources that can help you create your own digital hub of information. You can digitize works on one of our high resolution scanners, create these into searchable documents with OCR software, and publish online with tools such as Omeka, a digital publishing software.

You can also consult with our expert in Digital Humanities, Spencer Keralis, to find the right tools for your project. Check out last week’s blog post to learn more about him.

Maybe one day all museums will be available virtually? What are your thoughts?