Lightning Review: Optical Character Recognition: An Illustrated Guide to the Frontier

Lightning Review: Optical Character Recognition: An Illustrated Guide to the Frontier

Picture of OCR Book

Stephen V. Rice, George Nagy, and Thomas A. Nartaker’s work on OCR, though written in 1999, is still a remarkably valuable bedrock text for diving into the technology. Though OCR systems have, and continue to, evolve with each passing day, the study presented within their book still highlights some of the major issues one faces when performing optical character recognition. Text is in an unusual typeface or contains stray marks, print is too heavy or too light. This text gives those interested in learning the general problems that arise in OCR a great guide to what they and their patrons might encounter.

The book opens with a quote from C-3PO, and a discussion of how our collective sci-fi imagination believe technology will have “cognitive and linguistic abilities” that match and perhaps even exceed our own (Rice et al., 1999, p. 1).

C3PO Gif

 

The human eye is the most powerful character identifier to exist. As the authors note “A seven year old child can identify characters with far greater accuracy than the leading OCR systems” (Rice et al., 1999, 165). I found this simple explanation so helpful for when I get questions here in the Scholarly Commons from patron who are confused as to why their document, even after been run through and  OCR software, is not perfectly recognized. It is very easy, with our human eyes, to discern when a mark on a page is nothing of importance, and when it is a letter. Ninety-nine percent character accuracy doesn’t mean ninety-nine percent page accuracy.

Look with your special eyes Gif

In summary, this work presents a great starting point for those with an interest in understanding OCR technology, even at almost two decades old.

Give it, and the many other fabulous books in our reference collection, a read!

Facebook Twitter Delicious Email

Lightning Review: Text Analysis with R for Students of Literature

Cover of Text Analysis with R book

My undergraduate degree is in Classical Humanities and French, and like many humanities and liberal arts students, computers were mostly used for accessing Oxford Reference Online and double checking that “bonjour” meant “hello” before term papers were turned in. Actual critical analysis of literature came from my mind and my research, and nothing else. Recently, scholars in the humanities began seeing the potential of computational methods for their study, and coined these methods “digital humanities.” Computational text analysis provides insights that in many cases, aren’t possible for a human mind to complete. When was the last time you read 100 books to count occurrences of a certain word, or looked at thousands of documents to group their contents by topic? In Text Analysis with R for Students of Literature, Matthew Jockers presents programming concepts specifically how they relate to literature study, with plenty of help to make the most technophobic English student a digital humanist.

Jockers’ book caters to the beginning coder. You download practice text from his website that is already formatted to use in the tutorials presented, and he doesn’t dwell too much on pounding programming concepts into your head. I came into this text having already taken a course on Python, where we did edit text and complete exercises similar to the ones in this book, but even a complete beginner would find Jockers’ explanations perfect for diving into computational text analysis. There are some advanced statistical concepts presented which may turn those less mathematically inclined, but these are mentioned only as furthering understanding of what R does in the background, and can be left to the computer scientists. Practice-based and easy to get through, Text Analysis with R for Students of Literature serves its primary purpose of bringing the possibilities of programming to those used to traditional literature research methods.

Ready to start using a computer to study literature? Visit the Scholarly Commons to view the physical book, or download the eBook through the Illinois library.

Facebook Twitter Delicious Email

Exploring Data Visualization #7

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A collection of six different radar charts, each showing one student's test scores in multiple subjects

From “The Radar Chart and its Caveats” by Yan Holtz

1) Data analyst Yan Holtz and designer Conor Healy have helpfully compiled a list of visualization caveats at their site From Data to Viz. Among the common pitfalls in data visualization they discuss the use of radar charts, as in the image above.

Two elementary school floor plans generated by computer modeling, optimized to minimize traffic flow between classes and material usage. The floor plans look biological, with the hallways branching to smaller hallways and the rooms shaped as all sorts of polygons instead of rectangular.

From “Evolving Floorplans,” created by Joel Simon

2) Bioinformaticist Joel Simon “grew” an elementary school floor plan using advanced computer science methods. As he points out, “The results were biological in appearance, intriguing in character and wildly irrational in practice.” The project certainly demonstrates that computer models are only as good as the data that humans give them (in this case, there were no constraints based on architecture or engineering rules). On the other hand, imagine your school was laid out like this! Read all about the project at Simon’s website.

A demonstration of a chart makeover. The before chart shows two pie charts. Each slice of the pie chart is the percentage of U.S. population within an age group. The first pie chart is 2010, the second is 2013. The makeover, or "after" chart, is a slope graph that shows the change in millions of people within each age group, which are each represented by a line.

Chart makeover created by Patricia Manasan for Storytelling With Data

3) Want to feel inspired? Dozens of people submitted data visualization makeovers to Storytelling With Data. Take a look at what people changed for ideas about how to make your own visualizations better.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Facebook Twitter Delicious Email

Beginning again!

Hello students, faculty, and the amazing people of the University of Illinois at Urbana-Champaign! Your home for qualitative and quantitative research assistance, the Scholarly Commons, is re-opening with brand new hours!

That’s right, for the entirety of this beautiful fall semester we will be open from 8:30 am to 6 pm!

Will the Scholarly Commons still be hosting all its fantastic services this fall?

Why yes – yes they will!

The Scholarly Commons will be hosting:

Statistical Consulting :

Mondays: 10-4

Tuesdays: 10-4

Wednesdays: 10-1, 2-5

Thursdays: 10-4

Fridays: 10-4

The Survey Research Lab from 1-4 on Thursdays

Image result for science gif

And GIS Consultations

Mondays 9-2

Tuesdays 12-4

Wednesdays 9-1

Thursdays 11-1

Image result for but wait there's more gif

The Scholarly Commons is hosting a Data Visualization Competition!

Make your data something beautiful – and you could win big!

We’re also hosting an Open House on October 9th!

Stop by Main Library 220 from 4-5:30!

Image result for welcome gif

So much to see! So much to do!

We hope to see you all soon!

Facebook Twitter Delicious Email

Lightning Review: the truthful art by Alberto Cairo

Image of the truthful art

Hailed by one of our librarians as a brilliant and seminal text to understanding data visualization, the truthful art is a text that can serve both novices and masters in the field of visualization.

Packed with detailed descriptions, explanations, and images of just how Cairo wants readers to understand and engage with knowledge and data. Nearly every page of this work, in fact, is packed with examples of the methods Cairo is trying to connect his readers to.

Cairo’s work not only teaches readers how to best design their own visualizations, but goes into the process of explaining how to *read* data visualizations themselves. Portions of chapters are devoted to the necessity of ‘truthful’ visualizations, not only because “if someone hides data from you, they probably have something to hide” (Cairo, 2016, p. 49). The exact same data, when presented in different ways, can completely change the audience’s perspective on what the ‘truth’ of the matter is.

The most I read through the truthful art, the harder time I had putting it down. Cairo’s presentations of data, how vastly they could differ depending upon the medium through which they were visualized. It was amazing how Cairo could instantly pick apart a bad visualization, replacing it with one that was simultaneously more truthful and more beautiful.

There is specific portion of Chapter 2 where Cairo gives a very interesting visualization of “How Chicago Changed the Course of Its Rivers”. It’s detailed, informative, and very much a classic data visualization.

Then he compared it to a fountain.

The fountain was beautiful, and designed in a way to tell the same story as the maps Cairo had created. It was fascinating to see data presented in such a way, and I hadn’t fully considered that data could be represented in such a unique way.

the truthful art is here on our shelves in the Scholarly Commons, and we hope you’ll stop and give it a read! It’s certainly worthwhile one!

Facebook Twitter Delicious Email

Exploring Data Visualization #6

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

U.S. immigration represented by concentric rings like a tree, where outermost ring is the most recent, with colors denoting immigrants' origin primarily by continent

from National Geographic, “200 Years of U.S. Immigration Looks Like the Rings of a Tree”

1) Two Northeastern University professors visualized immigration data for National Geographic by creating a fascinating chart that looks a lot like the growth rings of a tree. They write, “Like countries, trees can be hundreds, even thousands, of years old. Cells grow slowly, and the pattern of growth influences the shape of the trunk. Just as these cells leave an informational mark in the tree, so too do incoming immigrants contribute to the country’s shape.”

two line graphs, one with a legend and one with direct line labeling, demonstrating the advantage of the latter

from StorytellingWithData, “Accessible data viz is better data viz”

2) Accessibility is important in all kinds of communication, and data visualization is no exception. But it’s not always obvious how to make visualizations more accessible. You can find several tips for improving your visualization in “Accessible data viz is better data viz.”

Polar histograms of the streets in major cities across the U.S.

by Geoff Boeing, “Comparing City Street Orientations”

3) Urban planning postdoc Geoff Boeing used open map data to create a series of polar histograms that demonstrate how the streets in various U.S. cities do or don’t follow a neat grid. It’s a great example of a visualization that looks intriguing and also packs a lot of information. Learn more about it in his blog post, Comparing City Street Orientations.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email me and set up an appointment at the Scholarly Commons.

Facebook Twitter Delicious Email

Our Graduate Assistants: Billy Tringali

This interview is part of a new series introducing our graduate assistants to our online community. These are some of the people you will see when you visit our space, who will greet you with a smile and a willingness to help! Say hello to Billy Tringali!

 


What is your background education and work experience?

I graduated summa cum laude from Bridgewater State University with dual majors in Anthropology and English and dual minors in Gender Studies and U.S. Ethnic/Indigenous Studies. I was able to gain a lot of great research experience during my undergrad, completing over a half-dozen funded research projects and speaking at nearly a dozen regional, national, and international conferences, which really made me fall in love with research.

After I graduated, I started working for Ventress Memorial Library and the Osterville Village Library Reference and Children’s Departments.

What led you to your field?

I’ve been volunteering in libraries since I was in the 5th grade, starting with my hometown library back in my home state of Massachusetts. I actually co-founded my library’s first Teen Advisory Board.

As a Reference Assistant I was happy to serve the diverse populations of these different towns, and as a Children’s Assistant I was able to aid in managing and developing collections that met the needs of children from 1 to 17! They were amazing experiences, and why I moved out here to pursue my Master’s in Library and Information Science.

What are your research interests?

I adore instruction, especially working with undergrads! Teaching students from a wide variety of backgrounds and getting students engaged with library resources has been so rewarding. I love looking into how information is presented and taught. I’ve been lucky enough to teach in our Savvy Researcher Series, so I’ve been able to get a lot of great experience there.

I’m also very interested in scholarly communication and publishing, which I think ties in well with my interest in engagement! From open-access to copyright, libraries are on the front of line of getting students connected with research and making sure information is available to them. We have the opportunity to present information and make it accessible, which is such a powerful thing.

What are your favorite projects you’ve worked on?

Working with undergraduate research. Teaching classes on presentation and assisting with finding/pairing resources felt so rewarding. Part of my work includes helping undergraduates create and manage their own academic journals, which was such an incredible combination of working with undergraduates and publishing!

Libraries can and should be engaged spaces of connection across departments, and entire universities.

What are some of your favorite underutilized resources that you would recommend?

My work in the Scholarly Commons has also extended into collection development – so I’d say our reference collection! We keep a well-updated library of works about everything from qualitative research techniques to the digital humanities.

Stop by and check it out!

When you graduate, what would your ideal job position look like? 

I’d like to continue to work in an academic library. Working for both the Main Library’s reference and instruction service and the Scholarly Commons as a specialized library has taught me that I’m most interested in positions that allow for teaching and engagement.

What is the one thing you would want people to know about your field?

Libraries are for everyone. Libraries are doing so many things that there really is something here for everyone!

Facebook Twitter Delicious Email

Puentes/Bridges: Highlights from DH2018

At the end of June, the Alliance of Digital Humanities Organizations (ADHO) coordinated their annual international DH conference, Digital Humanities 2018, in Mexico City. DH2018 was the first conference in the organization’s history to be held in Latin America and in the global south. With a theme of Puentes/Bridges, DH2018 emphasized transnational discourse and inclusivity. Here are some highlights from the event!

Latin@ voices in the Midwest: Ohio Habla Podcast
Elena Foulis of Ohio State University discussed Ohio Habla, a podcast project that seeks to educate others on the Latin@ experience in the Midwest with interviews conducted in English and Spanish (and a mixture of the two).

Visualizing the Digital Humanities Community
What does the DH community look like? Researchers from University College London’s Centre for Digital Humanities visualized how authors of DH articles cite each other and interact with each other on Twitter, and compared the two networks.

Network Analysis of Javanese Traditional Theatre
How do characters in Javanese traditional theatre relate to one another? In an excellent example of non-traditional digital publishing, Miguel Escobar Varela of the National University of Singapore communicates his research findings on an interactive webpage.

Mayan hieroglyphs as a computer font

Mayan hieroglyphs as a computer font

Achieving Machine-Readable Mayan Text Via Unicode
Carlos Pallan Gayol of the University of Bonn and Deborah Anderson of UC Berkeley work to create Unicode equivalents of Mayan hieroglyphs to create a machine-readable version, ensuring reliable access to this language across devices.

Hurricane Memorial: Chronicling the Hurricane of 1928
A massive hurricane devastated Florida, Puerto Rico, and other parts of the Caribbean in 1928, but the story of this storm shifts depending on who you ask. Most of the storm’s victims were black migrant workers from Puerto Rico and Caribbean islands, whose deaths are minimized in most accounts. Christina Boyles of Trinity College seeks to “bring the stories of the storm’s underrepresented victims back into our cultural memory.”

Does “Late Style” Exist? New Stylometric Approaches to Variation in Single-Author Corpora
Jonathan Pearce Reeve presented some preliminary findings of his research on investigating whether or not an author has a true “late style.” Late style is a term most well-known from the works of Edward Said, alluding to an author’s shift to a writing style later in life that is unique from their “early” style. Read a review of his book, On Late Style. Code and other supplemental materials from Reeve’s research are available on GitHub.

screenshot from 4 rios webpage, shows drawings of people

4 Ríos: El Naya
A digital storytelling project about the impacts of armed conflict in Colombia, 4 Ríos is a transmedia project that includes a website, short film, and an interactive web-comic.

Researchers from our own University of Illinois participated in the conference, including Megan Senseney and Dan Tracy. Senseney, along with other Illinois researchers, presented “Audiences, Evidence, and Living Documents: Motivating Factors in Digital Humanities Monograph Publishing,” a survey of motivations behind humanities scholars digital publishing actions and needs. Megan also participated in a panel, “Unanticipated Afterlives: Resurrecting Dead Projects and Research Data for Pedagogical Use,” a discussion about how we might use unmaintained DH projects and data for learning purposes.

Tracy and other Illinois researchers presented a poster, Building a Bridge to Next Generation DH Services in Libraries with a Campus Needs Assessment, a report of results gathered while surveying the need for future DH services at research institutions, and how the library might facilitate this evolution. View Tracy’s poster in IDEALS.

ADHO gathered all resources tweeted out during the conference that you can view. You can also view a detailed schedule of presentations with descriptions here, or see paper abstracts here. Or, search #DH2018 on Twitter to see all the happenings!

Facebook Twitter Delicious Email

Our Graduate Assistants: Kayla Abner

This interview is part of a new series introducing our graduate assistants to our online community. These are some of the people you will see when you visit our space, who will greet you with a smile and a willingness to help! Say hello to Kayla Abner!

What is your background education and work experience?

I have a Bachelor’s degree in Classical Humanities and French from Wright State University in Dayton (Go Raiders!). My original plan was to teach high school French or Latin, but after completing a student teaching practicum, I decided that wasn’t for me. During undergrad and after graduation, I always wound up in a job role that involved research or customer service in some capacity, which I really enjoyed.

What led you to your field?

Knowing that I enjoyed working on research, I considered going back to school for Library Science, but wanted to be sure before taking the jump. It was always interesting to see the results of the research I helped conduct, and I enjoyed helping people find answers, whether it was a coworker or a client. After a visit to an American Library Association conference in 2016,  I fell in love with the collaborative and share-alike nature of librarianship, and was accepted to this program the next year!

What are your research interests?

Library science has so many interesting topics, it’s hard to choose one. But, I like looking at how people seek information, and how libraries can use that knowledge to enhance their services and resources. I’m a browser when it comes to book shelves, and it’s interesting to see how libraries are succeeding/failing at bringing that experience to the digital realm.

What are your favorite projects you’ve worked on?

I have two positions here in the library, one in the Scholarly Commons, and one working with our (current) Digital Humanities librarian, Dan Tracy. In both roles, I’ve worn a lot of hats, so to speak. My favorites have been creating resources like library guides, and assisting with creating content for our Savvy Researcher workshop series. Maintaining our library guides requires some experience with the software, so I enjoy learning new cool things that our programs can do. I also do a lot of graphic design work, which is a lot of fun!

Completing some of these tasks let me use some Python knowledge from my coursework, which is sort of like a fun puzzle (how do I get this to work??). I’m really interested in using digital methods and tools in research, like text mining and data visualization. Coming from a humanities background, it is very exciting to see the cool things humanists can do beyond traditional scholarship. Digital humanities is a really interesting field that bridges the gap between computer science and the humanities.

What are some of your favorite underutilized resources that you would recommend?

Our people! They aren’t underutilized, but I love an opportunity to let campus know that we are an excellent point of contact between you and an expert. If you have a weird research question in one of our service areas, we can put in contact with the best person to help you.

When you graduate, what would your ideal job position look like?

I would love to work in an academic research library in a unit similar to the Scholarly Commons, where researchers can get the support they need to use digital methods and data in their research, especially in the humanities. There is a such a breadth of digital techniques that humanities researchers can utilize, that don’t necessarily replace traditional research methods. Distant reading a text puts forth different observations than traditional close reading, and both are equally useful.

What is the one thing you would want people to know about your field?

Librarians are happy to help you; don’t let a big desk intimidate you away from asking a question. That’s why we’re here!

Facebook Twitter Delicious Email

Lightning Review: How to Use SPSS

“A nice step-by-step explanation!”

“Easy, not too advanced!”

“A great start!”

           Real, live reviews of Brian C. Cronk’s How to Use SPSS: A Step-By-Step Guide to Analysis and Interpretation by some of our patrons! This book, the Tenth Edition of this nine-chapter text published by Taylor and Francis, is ripe with walkthroughs, images, and simple explanations that demystifies the process of learning this statistical software. Also containing six appendixes, our patrons sang its praises after a two-hour research session here in the Scholarly Commons!

           SPSS, described on IBM’s webpage as “the world’s leading statistical software used to solve business and research problems by means of ad-hoc analysis, hypothesis testing, geospatial analysis and predictive analytics. Organizations use IBM SPSS Statistics to understand data, analyze trends, forecast and plan to validate assumptions and drive accurate conclusions’ is one of many tools CITL Statistical Consulting uses on a day-to-day basis in assisting Scholarly Commons patrons. Schedule a consultation with them from 10 am to 2 pm, Monday through Thursday, for the rest of the summer!

           We’re thrilled to hear this 2018 title is a hit with the researcher’s we serve! Cronk’s book, and so many more works on software, digital publishing, data analysis, and so much more make up our reference collection – free to use by anyone and everyone in the Scholarly Commons!

Facebook Twitter Delicious Email