Explore the Possibilities with ArcGIS StoryMaps

ArcGIS StoryMaps is a handy tool for combining narrative, images, and maps to present information in an engaging way. Organizations have used StoryMaps for everything from celebrating their conservation achievements on their 25th anniversary to exploring urban diversity in Prague. The possibilities are vast, which can be both exciting and intimidating for people who are just getting started. I want to share some of my favorite StoryMap examples, which will demonstrate how certain StoryMap tools can be used and hopefully provide inspiration for your project.

A Homecoming for Gonarezhou’s Black Rhinos

Screenshot of a storymap with text about and an image of rhinos.

If GIS and map creation are a bit outside your wheel-house, no worries! A Homecoming for Gonarezhou’s Black Rhinos is a StoryMap created by the Rhino Recovery Fund that is a great example of how a StoryMap can be made without using any maps. It’s also a good example of the timeline feature as well as making great use of a custom theme by incorporating the nonprofit’s signature pink into the story’s design.

Sounds of the Wild West

Screenshot of a storymap with text about and an image of the Yellowstone River.

Sounds of the Wild West is a StoryMap created by Acoustic Atlas that takes you on an audio tour of four different Montana ecosystems. This StoryMap is a lovely example of how powerful images and audio can immerse people in a location, enhancing their understanding of the information presented. The authors also made great use of the StoryMap sidecar, layering text, images, and audio to create their tour.

California’s Superbloom

Header of the California's Superbloom StoryMap

Speaking of beautiful photos, this StoryMap about California’s Superbloom is full of them! It’s a great example of the StoryMap image gallery and “swipe” tools. The StoryMap swipe tool allows you to juxtapose different maps or images, revealing the difference between, for example, historical and modern photos, or satellite imagery during different times of year in the same region.

The Surprising State of Africa’s Giraffes

Screenshot of The Surprising State of Africa’s Giraffes StoryMap with a map highlighting the habitat of the Northern Giraffe

The Surprising State of Africa’s Giraffes is a StoryMap created by ESRI’s StoryMaps team that demonstrates another great use for the sidecar. As users scroll through the sidecar pictured above, different regions of the map are highlighted in an almost animated effect. This not only provides geographic context to the information, but does so in a dynamic way. This StoryMap also includes a great example of an express map, which is an easy way to make an interactive map without any GIS experience or complicated software.

Map Tour Examples

StoryMaps also features a tool that allows you to take users on a tour around the world – or just around your hometown. The map tour comes in two forms: a guided tour, like the one exemplified in Crowded Skies, Expanding Airports; and an explorer tour, such as The Things that Stay with Us.

StoryMaps Gallery

There are so many different forms a StoryMap can take! To see even more possibilities, check out the StoryMaps Gallery to explore nearly a hundred different examples. If you’re ready to get your feet wet but want a bit more support, keep an eye on the Savvy Researcher calendar for upcoming StoryMap workshops at the UIUC Main Library.

Tech Teaching: Pedagogies for Teaching Technology and other Software Tools to Learners

When I began my role as a graduate assistant at the Scholarly Commons, my background in technology was extremely limited. As I have worked in this space, however, I have not only had the opportunity to learn how to use technology myself but teach others how to use these same tools through consultations and workshops. As technology begins to encompass more of our lives, I wanted to share a few tips for providing instruction focused on digital technology and software. While these pedagogies also apply to other teaching contexts, specific examples in this post will cater to digital technology.

Photo of two people working together on laptops.
Photo Credit: Christina Morillo

Active Learning

A crucial component of learning technology is allowing learners to directly engage with the technology they are looking to understand. By having direct engagement with the tool, learners will have a better grasp of how that tool works instead of just hearing about its functions in the abstract. If possible, it is highly encouraged that the instructional session has users access the technology or software they are learning, so that they can follow along as they experience how to navigate the tool. If that is not possible though, a few other alternatives may include watching the tool work from either the instructor conducting a live demonstration or finding a video directly showing the technology at work.

Scaffolding

Since technology is often complex, it is very easy for learners to feel overwhelmed by the sheer number of options and possibilities of what certain resources can do. Scaffolding as an instructional concept is a practice of designing a lesson that segments information into smaller sections that build upon each other. When providing instruction for a software program, for example, scaffolding may look like first helping users navigate the options of the tool, following that up with a basic function of the program, then performing a more complex task. Each of these steps is meant to build on one another and guide the learner by both showing them new aspects of the topic while incorporating previously acquired knowledge.

Photo of two people in front of a computer, with one older person guiding the hand of the younger person.
Photo Credit: August de Richelieu

Inclusive Learning

While inclusivity is valued in every learning environment, it is especially vital that instructors provide inclusive environments for teaching digital technology. Neglecting these principles will ultimately create barriers for certain users learning new technology. For general instruction sessions, applying universal design models will help streamline the process so that the session is accessible and meaningful for all types of learners. Considerations for font size when presenting to a workshop/classroom setting, for example, often help those with visual impairments follow along more easily, whereas not taking these considerations makes the learning process more difficult for them. Accommodating specific needs also helps to create an equitable environment that fosters learning for those whose needs may not be accounted for otherwise.

Using These Pedagogies in Personal Learning

Even if you are not planning on teaching others how to use technology, these same methods can also help you learn. Finding opportunities to engage with a particular tool hands-on will help you learn how to use it, rather than just reading articles abstractly about it. Likewise, breaking the content into smaller sections will help prevent overloading and help you progress in mastery of the tool. Finally, recognizing your needs as a learner and finding tools that are relevant to your needs will lift certain barriers to learning certain technologies. As you seek to learn and teach new technology, be creative and have fun with it!

*hacker voice* “I’m in” – Coding and Software for Data Analysis

While data analysis has existed in one form or another for centuries, its modern concept is highly tied to a digital environment, which means that people who are looking to move into the data science field will undoubtedly need some technology skills. In the data field, the primary coding languages include Python, R, and SQL. Software is a bit more complicated, with numerous different programs and services used depending on the situation, including Power BI, Spark, SAS, Excel, to name a few. While this is overwhelming, remember that it is not important to become an expert in all of the languages and software. Becoming skilled in one language and a few of the software options, depending on your interest or on the in-demand skills on job listings, will give you the transferable skills to quickly pick up the other languages and software as needed. If this still seems to be  an overwhelming prospect, remember that the best way to eat an elephant is one bite at a time. Take your time, break up the task, and focus on one step at a time! 

LinkedIn Learning

  1. Python for Data Science Essential Training Part 1 
    1.  This 6 hour course guides users through an entire data science project that includes web scrapers, data cleaning and reformatting, generate visualizations, preform simple data analysis and create interactive graphs. The project will have users coding in Python with confidence and give learners a foundation in the Plotly library. Once completed, learners will be able to design and run their own data science projects.  
  1. R for Excel Users 
    1. With Excel being a familiar platform for many interested in data, it is an ideal bridge to more technical skills, like coding in the R language. This course is specifically designed for data analytics with its focus on statistical tasks and operations. It will take user’s Excel skills to another level while also laying a solid foundation for their new R skills. Users will be able to switch between Excel and the R Desctools package to complete tasks seamlessly, using the best of each software to calculate descriptive statistics, run bivariate analyses, and more. This course is for people who are truly proficient in Excel but new to R, so if you need to brush up your Excel skills, go back to the first post in this series and go over the Excel resources!   
  1. SQL Essential Training 
    1. SQL is the language of relational databases, so it is of interest to anyone looking to expand their data handling skills. This training is designed to give data wranglers the tools they need to use SQL effectively using the SQLiteStudio Software. Learners will soon be able to create tables, define relationships, manipulate strings, use triggers to automate actions, and use sub selects and views. Real world examples are used throughout the course and learners will finish the course by building their own SQL application. If you want a gentler introduction to SQL, check out our earlier post on SQL Murder Mystery  

O’Reilly Books and Videos (Make sure to follow these instructions for logging in!) 

  1. Data Analysts Toolbox – Excel, Python, Power BI, Alteryx, Qlik Sense, R, Tableau 
    1. This 46 hour course is not for the faint of heart, but by the end, users will be a Swiss army knife data analyst. This isn’t for true beginners, but rather people who are already familiar with the basic data analysis concepts and have a good grasp of Excel. It is included in this list because it is a great source for learning the basics of the myriad of software and programming languages that data analysts are expected to know, all in one place. The course starts with teaching users about advanced pivot tables, so if users have already mastered the basic pivot table, they should be ready for this course.  
  1. Programming for Data Science: Beginner to Intermediate 
    1. This is an expert curated playlist of courses and book chapters that is designed to help people who are familiar with the math side of data analysis, but not the computer science side. This playlist gives users an introduction to NumPy, Pandas, Python, Spark and other technical data skills. Some previous experience with coding may be helpful in this course, but patience will make up for lack of experience.  

In the Catalog

  1. Python crash course : a hands-on, project-based introduction to programming 
    1. Python is often lauded as one of the most approachable coding languages to learn and its functionality makes it popular in the data science field. So it is no surprise that there are a lot of resources on and off campus for learning Python. This approachable guide is just one of the many resources available to UIUC students, but it stands out with its contents and overall outcomes. “Python Crash Course” covers general programming concepts, Python fundamentals, and problem solving. Unlike some other resources, this guide focuses on many of Python’s uses, not just its data analytics capabilities, which can be appealing to people who want to be more versatile with their skills. However, it is the three projects that make this resource stand out from the rest. Readers will be guided in how to create a simple video game, use data visualization techniques to make graphs and charts, and build an interactive web application.  
  1. The Book of R : a first course in programming and statistics 
    1. R is the most popular coding language for statistical analysis, so it’s clearly important for data analysts to learn. The Book of R is a comprehensive and beginner friendly guide designed for readers who have no previous programming experience or a shaky mathematical foundation as readers will learn both concurrently through the book’s lessons. Starting with writing simple programs and data handling skills, learners will then move forward to producing statistical summaries of data, preforming statistical tests and modeling, create visualizations with contributed packages like ggplot2 and ggvis, write data frames, create functions, and use variables, statements, and loops; statistical concepts like exploratory data analysis, probabilities, hypothesis tests, and regression modeling, and how to execute them in R; how to access R’s thousands of functions, libraries, and data sets; how to draw valid and useful conclusions from your data; and how to create publication-quality graphics of your results.  

Join us next week for our final installment of the Winter Break Data Analysis series: “You can’t analyze data if you ain’t cute: Data Visualization for Data Analysis”    

Free, Open Source Optical Character Recognition with gImageReader

Optical Character Recognition (OCR) is a powerful tool to transform scanned, static images of text into machine-readable data, making it possible to search, edit, and analyze text. If you’re using OCR, chances are you’re working with either ABBYY FineReader or Adobe Acrobat Pro. However, both ABBYY and Acrobat are propriety software with a steep price tag, and while they are both available in the Scholarly Commons, you may want to perform OCR beyond your time at the University of Illinois.

Thankfully, there’s a free, open source alternative for OCR: Tesseract. By itself, Tesseract only works through the command line, which creates a steep learning curve for those unaccustomed to working with a command-line interface (CLI). Additionally, it is fairly difficult to transform a jpg into a searchable PDF with Tesseract.

Thankfully, there are many free, open source programs that provide Tesseract with a graphical user interface (GUI), which not only makes Tesseract much easier to use, some of them come with layout editors that make it possible to create searchable PDFs. You can see the full list of programs on this page.

The program logo for gImageReader

The program logo for gImageReader

In this post, I will focus on one of these programs, gImageReader, but as you can see on that page, there are many options available on multiple operating systems. I tried all of the Windows-compatible programs and decided that gImageReader was the closest to what I was looking for, a free alternative to ABBYY FineReader that does a pretty good job of letting you correct OCR mistakes and exporting to a searchable PDF.

Installation

gImageReader is available for Windows and Linux. Though they do not include a Mac compatible version in the list of releases, it may be possible to get it to work if you use a package manager for Mac such as Homebrew. I have not tested this though, so I do not make any guarantees about how possible it is to get a working version of gImageReader on Mac.

To install gImageReader on Windows, go to the releases page on Windows. From there, go to the most recent release of the program at the top and click Assets to expand the list of files included with the release. Then select the file that has the .exe extension to download it. You can then run that file to install the program.

Manual

The installation of gImageReader comes with a manual as an HTML file that can be opened by any browser. As of the date of this post, the Fossies software archive is hosting the manual on its website.

Setting OCR Mode

gImageReader has two OCR modes: “Plain Text” and “hOCR, PDF”. Plain Text is the default mode and only recognizes the text itself without any formatting or layout detection. You can export this to a text file or copy and paste it into another program. This may be useful in some cases, but if you want to export a searchable PDF, you will need to use hOCR, PDF mode. hOCR is a standard for formatting OCR text using either XML or HTML and includes layout information, font, OCR result confidence, and other formatting information.

To set the recognition to hOCR, PDF mode, go to the toolbar at the top. It includes a section for “OCR mode” with a dropdown menu. From there, click the dropdown and select hOCR, PDF:

gImageReader Toolbar

This is the toolbar for gImageReader. You can set OCR mode by using the dropdown that is the third option from the right.

Adding Images, Performing Recognition, and Setting Language

If you have images already scanned, you can add them to be recognized by clicking the Add Images button on the left panel, which looks like a folder. You can then select multiple images if you want to create a multipage PDF. You can always add more images later by clicking that folder button again.

On that left panel, you can also click the Acquire tab button, which allows you to get images directly from a scanner, if the computer you’re using has a scanner connected.

Once you have the images you want, click the Recognize button to recognize the text on the page. Please note that if you have multiple images added, you’ll need to click this button for every page.

If you want to perform recognition on a language other than English, click the arrow next to Recognize. You’ll need to have that language installed, but you can install additional languages by clicking “Manage Languages” in the dropdown appears. If the language is already installed, you can go to the first option listed in the dropdown to select a different language.

Viewing the OCR Result

In this example, I will be performing OCR on this letter by Franklin D. Roosevelt:

Raw scanned image of a typewritten letter signed by Franklin Roosevelt

This 1928 letter from Franklin D. Roosevelt to D. H. Mudge Sr. is courtesy of Madison Historical: The Online Encyclopedia and Digital Archive for Madison County Illinois. https://madison-historical.siue.edu/archive/items/show/819

Once you’ve performed OCR, there will be an output panel on the right. There are a series of buttons above the result. Click the button on the far right to view the text result overlaid on top of the image:

The text result of performing OCR on the FDR letter overlaid on the original scan.

Here is the the text overlaid on an image of the original scan. Note how the scan is slightly transparent now to make the text easier to read.

Correcting OCR

The OCR process did a pretty good job with this example, but it there are a handful of errors. You can click on any of the words of text to show them on the right panel. I will click on the “eclnowledgment” at the end of the letter to correct it. It will then jump to that part of the hOCR “tree” on the right:

hOCR tree in gImageReader, which shows the recognition result of each word in a tree-like structure.

The hOCR tree in gImageReader, which also shows OCR result.

Note in this screenshot I have clicked the second button from the right to show the confidence values, where the higher the number, the higher the confidence Tesseract has with the result. In this case, it is 67% sure that eclnowledgement is correct. Since it obviously isn’t correct, we can type new text by double-clicking on the word in this panel and type “acknowledgement.” You can do this for any errors on the page.

Other correction tips:

  1. If there are any regions that are not text that it is still recognizing, you can right click them on the right and delete them.
  2. You can change the recognized font and its size by going to the bottom area labeled “Properties.” Font size is controlled by the x_fsize field, and x_font has a dropdown where you can select a font.
  3. It is also possible to change the area of the blue word box once it is selected, simply by clicking and dragging the edges and corners.
  4. If there is an area of text that was not captured by the recognition, you can also right click in the hOCR “tree” to add text blocks, paragraphs, textlines, and words to the document. This allows you to draw a box on image and then type what the text says.

Exporting to PDF

Once you are done making OCR corrections, you can export to a searchable PDF. To do so, click the Export button above the hOCR “tree,” which is the third button from the left. Then, select export to PDF. It then gives you several options to set the compression and quality of the PDF image, and once you click OK, it should export the PDF.

Conclusion

Unfortunately, there are some limitations to gImageViewer, as can often be the case with free, open source software. Here are some potential problems you may have with this program:

  1. While you can add new areas to recognize with OCR, there is not a way to change the order of these elements inside the hOCR “tree,” which could be an issue if you are trying to make the reading order clear for accessibility reasons. One potential workaround could be to use the Reading Order options on Adobe Acrobat, which you can read about in this libguide.
  2. You cannot show the areas of the document that are in a recognition box unless you click on a word, unlike ABBYY FineReader which shows all recognition areas at once on the original image.
  3. You cannot perform recognition on all pages at once. You have to click the recognition button individually for each page.
  4. Though there are some image correction options to improve OCR, such as brightness, contrast, and rotation, it does not have as many options as ABBYY FineReader.

gImageViewer is not nearly as user friendly or have all of the features that ABBYY FineReader has, so you will probably want to use ABBYY if it is available to you. However, I find gImageViewer a pretty good program that can meet most general OCR needs.

Statistical Analysis at the Scholarly Commons

The Scholarly Commons is a wonderful resource if you are working on a project that involves statistical analysis. In this post, I will highlight some of the great resources the Scholarly Commons has for our researchers. No matter what point you are at in your project, whether you need to find and analyze data or just need to figure out which software to use, the Scholarly Commons has what you need!

Continue reading