Lightning Review: How to Use SPSS

“A nice step-by-step explanation!”

“Easy, not too advanced!”

“A great start!”

           Real, live reviews of Brian C. Cronk’s How to Use SPSS: A Step-By-Step Guide to Analysis and Interpretation by some of our patrons! This book, the Tenth Edition of this nine-chapter text published by Taylor and Francis, is ripe with walkthroughs, images, and simple explanations that demystifies the process of learning this statistical software. Also containing six appendixes, our patrons sang its praises after a two-hour research session here in the Scholarly Commons!

           SPSS, described on IBM’s webpage as “the world’s leading statistical software used to solve business and research problems by means of ad-hoc analysis, hypothesis testing, geospatial analysis and predictive analytics. Organizations use IBM SPSS Statistics to understand data, analyze trends, forecast and plan to validate assumptions and drive accurate conclusions’ is one of many tools CITL Statistical Consulting uses on a day-to-day basis in assisting Scholarly Commons patrons. Schedule a consultation with them from 10 am to 2 pm, Monday through Thursday, for the rest of the summer!

           We’re thrilled to hear this 2018 title is a hit with the researcher’s we serve! Cronk’s book, and so many more works on software, digital publishing, data analysis, and so much more make up our reference collection – free to use by anyone and everyone in the Scholarly Commons!

Using an Art Museum’s Open Data

*Edits on original idea and original piece by C. Berman by Billy Tringali

As a former art history student, I’m incredibly interested in the how the study of art history can be aided by the digital humanities. More and more museums have started allowing the public to access a portion of their data. When it comes to open data, museums seem to be lagging a bit behind other cultural heritage institutions, but many are providing great open data for humanists.

For art museums, the range of data provided ranges. Some museums are going the extra mile to give a lot of their metadata to the public. Others are picking and choosing aspects of their collection, such as the Museum of Modern Art’s Exhibition and Staff Histories.

Many museums, especially those that collect modern and contemporary art, can have their hands tied by copyright laws when it comes to the data they present. A few of the data sets currently available from art museums are the Cooper Hewitt’s Collection Data, the Minneapolis Institute of Arts metadata, the Rijksmuseum API, the Tate Collection metadata, and the Getty Vocabularies.

The Metropolitan Museum of Art has recently released all images of the museum’s public domain works under a Creative Commons Zero license.

More museum data can be found here!

What Storify Shutting Down Means to Us

The Storify logo.

You may have heard that popular social media story platform Storify will be shutting down on May 16, 2018. Open to the public since 2011, it has hosted everything from academic conference tweet round-ups to “Dear David”, the ongoing saga of Buzzfeed writer Adam Ellis and the ghost that haunts his apartment. So it shocked long-time users in December when Storify suddenly announced that it would be shutting down in just a few months.

Already, Storify is no longer allowing new accounts to be created, and by May 1st, users won’t be able to create new stories. On May 16th, everything disappears. Storify will continue on with Storify 2, a feature of Livefyre, but will require you to purchase a Livefyre license for access. But the fact is that many users cannot or will not pay for Livefyre. Essentially, Storify will cease to exist on May 16th to most people.

So… what does this mean?

Of course, it means that you need to export anything that you have stored on Storify and want to save. (They provide instructions for exporting content on their shutting down FAQ.) More than that, however, we need to talk about how we are relying on services to archive our materials online and how that is a dangerous long-term preservation strategy.

The fact is, free Internet services can change in an instant, and without consulting their user base. As we have seen with Storify — as well as other services like Google Reader — what seems permanent can disappear quickly. When it comes to long-term digital preservation, we cannot solely depend on them as our only means of preservation.

That is not to say that we cannot use free digital tools like Storify. Storify was a great way to collect Tweets, present stories, and get information out to the public. And if you or your institution did not have the funds or support to create a long-term preservation plan, Storify was a great stop-gap until then. But digital preservation is a marathon, not a race, and we will need to continue to find new, innovative ways to ensure that digital material remains accessible.

When I heard Storify was shutting down, I went to our Scholarly Commons intern Matt Pitchford, whose research is on social media and who has a real stake in maintaining digital preservation, for his take on the issue. (You can read about Matt’s research here and here.) Here’s what Matt had to say:

Thinking about [Storify shutting down] from a preservation perspective, I think it reinforces the need to develop better archival tools along two dimensions: first, along the lines of navigating the huge amounts of data and information online (like how the Library of Congress has that huge Twitter archive, but no means to access it, and which they recently announced they will stop adding to). Just having all of Storify’s data wouldn’t make it navigable. Second, that archival tools need to be able to “get back” to older forms of data. There is no such thing as a “universally constant” medium. PDFs, twitter, Facebook posts, or word documents all may disappear over time too, despite how important they seem to our lives right now. Floppy disks, older computer games or programs, and even recently CDs, aren’t “accessible” in the way they used to be. I think the same is eventually going to be true of social media.
Matt brings up some great issues here. Storify shutting down could simply be a harbinger of more change online. Social media spaces come and go (who else remembers MySpace and LiveJournal?), and even the nature of posts change (who else remembers when Tweets were just 140 characters?). As archivists, librarians, and scholars, we will have to adopt, adapt, and think quickly in order to stay ahead of forces that are out of our control.
And most importantly, we’ll have to save backups of everything we do.

Spotlight: Unexpected Surprises in the Internet Archive

Image of data banks with the Internet Archive logo on them.

The Internet Archive.

For most of us, our introduction to the Internet Archive was the Wayback Machine, a search engine that can show you snapshots of websites from the past. It’s always fun to watch a popular webpage like Google evolve from November 1998 to July 2004 to today, but there is so much more that the Internet Archive has to offer. Today, I’m going to go through a few highlights from the Internet Archive’s treasure trove of material, just to get a glimpse at all of the things you can do and see on this amazing website.

Folksoundomy: A Library of Sound

Folksoundomy is the Internet Archives’ collection of sounds, music and speech. The collection is collaboratively created and tagged, with many participants from outside of the library sphere. There are more than 155,960 items in the Folksoundomy collection, with items that range in date back to the invention of Thomas Edison’s invention of recorded sound in 1877. From Hip Hop Mixtapes to Russian audiobooks, sermons to stand-up comedy, music, podcasts, radio shows and more, Folksoundomy is an incredible resource for scholars looking at the history of recorded sound.

TV News Archive

With over 1,572,000 clips collected since 2009, the TV News Archive includes everything from this morning’s news to curated series of fact-checked clips. Special collections within the TV News Archive include Understanding 9/11, Political Ads, and the TV NSA Clip Library. With the ability to search closed captions from US TV new shows, the Internet Archive provides a unique research opportunity for those studying modern US media.

Software Library: MS-DOS Games

Ready to die of dysentery on The Oregon Trail again? Now is your chance! The Internet Archive’s MS-DOS Games Software Library uses an EM-DOSBOX in-browser emulator that lets you go through and play games that would otherwise seem lost to time. Relive your childhood memories or start researching trends in video games throughout the years with this incredible collection of playable games!

National Security Internet Archive (NSIA)

Created in March 2015, the NSIA collects files from muckracking and national security organizations, as well as historians and activists. With over 2 million files split into 36 collections, the NSIA helps collect everything from CIA Lessons Learned from Czechoslovakia to the UFO Files, a collection of declassified UFO files from around the world. Having these files accessible and together is incredibly helpful to researchers studying the history of national security, both in the US and elsewhere in the world.

University of Illinois at Urbana-Champaign

That’s right! We’re also on the Internet Archive. The U of I adds content in several areas: Illinois history, culture and natural resources; US railroad history; rural studies and agriculture; works in translation; as well as 19th century “triple-decker” novels and emblem books. Click on the above link to see what your alma mater is contributing to the Internet Archive today!

Of course, this is nowhere near everything! With Classic TV CommercialsGrateful Dead, Community Software and more, it’s definitely worth your time to see what on the Internet Archive will help you!

Digital Timeline Tools

Everyone has a story to tell. For many of us doing work in the humanities and social sciences, presenting our research as a timeline can bring it new depth and a wider audience. Today, I’ll be talking about two unique digital storytelling options that you can use to add dimension to your research project.

Timeglider

An image of Timeglider's sample timeline on the Wright Brothers

Timeglider is an interactive timeline application. It allows you to move in and out time, letting you see time in large or small spans. It also allows events to overlap, so you can show the relationship of things in time. Timeglider also gives some great aesthetic options, including what they call their “special sauce” — the way they relate the size of an event to its importance. This option emphasizes certain events in the timeline to the user, and can make getting important ideas across simpler.

Started in 2002 as a flash-based app, Timeglader is one of the older timeline options on the web. After a major redesign in 2010, Timeglider is now written in HTML5 and JavaScript. Timeglider is free for students for a basic package, and plans for non-students can choose to pay either $5/month or $50/year.

Overall, Timeglider is an interesting timeline application with numerous options. Give it a try!

myHistro

A screenshot from a myHistro project on the Byzantine Empire.

myHistro uses text, video and pictures on maps and timelines to tell stories. Some of the power of myHistro comes from the sheer amount of information you can provide in one presentation. Presentations can include introductory text, an interactive timeline, a Google Maps-powered annotated map, and a comment section, among other attributes. The social aspect, in particular, makes myHistro powerful. You can open your work up to a large audience, or simply ask students and scholars to make comments on your work for an assignment. Another interesting aspect of myHistro is the sheer amount of projects people have come up with for it. There is everything from histories of the French Revolution to the biography of Justin Bieber, with everything in between!

myHistro is free, and you can sign up using your email or social network information.

Got Bad Data? Check Out The Quartz Guide

Photograph of a man working on a computer.

If you’re working with data, chances are, there will be at least a few times where you encounter the “nightmare scenario”. Things go awry — values are missing, your sample is biased, there are inexplicable outliers, or the sample wasn’t as random as you thought. Some issues you can solve, other issues are less clear. But before you tear your hair out — or, before you tear all of your hair out — check out The Quartz guide to bad data. Hosted on GitHub,The Quartz guide lists out possible problems with data, and how to solve them, so that researchers have an idea of what next-steps can be when their data doesn’t work as planned.

With translations into six languages and a CreativeCommons 4.0 license, The Quartz guide divides problems into four categories: issues that your source should solve, issues that you should solve, issues a third-party expert should help you solve, and issues a programmer should help you solve. From there, the guide lists specific issues and explains how they can or cannot be solved.

One of the greatest things about The Quartz guide is the language. Rather than pontificating and making an already frustrating problem more confusing, the guide lays out options in plain terms. While you may not get everything you need for fixing your specific problem, chances are you will at least figure out how you can start moving forward after this setback.

The Quartz guide does not mince words. For example, in the “Data were entered by humans” example, it gives an example of messy data entry then says, “Even with the best tools available, data this messy can’t be saved. They are effectively meaningless… Beware human-entered data.” Even if it’s probably not what a researcher wants to hear, sometimes the hard, cold truth can lead someone to a new step in their research.

So if you’ve hit a block with your data, check out The Quartz guide. It may be the thing that will help you move forward with your data! And if you’re working with data, feel free to contact the Scholarly Commons or Research Data Service with your questions!

Open Source Tools for Social Media Analysis

Photograph of a person holding an iPhone with various social media icons.

This post was guest authored by Kayla Abner.


Interested in social media analytics, but don’t want to shell out the bucks to get started? There are a few open source tools you can use to dabble in this field, and some even integrate data visualization. Recently, we at the Scholarly Commons tested a few of these tools, and as expected, each one has strengths and weaknesses. For our exploration, we exclusively analyzed Twitter data.

NodeXL

NodeXL’s graph for #halloween (2,000 tweets)

tl;dr: Light system footprint and provides some interesting data visualization options. Useful if you don’t have a pre-existing data set, but the one generated here is fairly small.

NodeXL is essentially a complex Excel template (it’s classified as a Microsoft Office customization), which means it doesn’t take up a lot of space on your hard drive. It does have advantages; it’s easy to use, only requiring a simple search to retrieve tweets for you to analyze. However, its capabilities for large-scale analysis are limited; the user is restricted to retrieving the most recent 2,000 tweets. For example, searching Twitter for #halloween imported 2,000 tweets, every single one from the date of this writing. It is worth mentioning that there is a fancy, paid version that will expand your limit to 18,000, the maximum allowed by Twitter’s API, or 7 to 8 days ago, whichever comes first. Even then, you cannot restrict your data retrieval by date. NodeXL is a tool that would mostly be most successful in pulling recent social media data. In addition, if you want to study something besides Twitter, you will have to pay to get any other type of dataset, i.e., Facebook, Youtube, Flickr.

Strengths: Good for a beginner, differentiates between Mentions/Retweets and original Tweets, provides a dataset, some light data visualization tools, offers Help hints on hover

Weaknesses: 2,000 Tweet limit, free version restricted to Twitter Search Network

TAGS

TAGSExplorer’s data graph (2,902 tweets). It must mean something…

tl;dr: Add-on for Google Sheets, giving it a light system footprint as well. Higher restriction for number of tweets. TAGS has the added benefit of automated data retrieval, so you can track trends over time. Data visualization tool in beta, needs more development.

TAGS is another complex spreadsheet template, this time created for use with Google Sheets. TAGS does not have a paid version with more social media options; it can only be used for Twitter analysis. However, it does not have the same tweet retrieval limit as NodeXL. The only limit is 18,000 or seven days ago, which is dictated by Twitter’s Terms of Service, not the creators of this tool. My same search for #halloween with a limit set at 10,000 retrieved 9,902 tweets within the past seven days.

TAGS also offers a data visualization tool, TAGSExplorer, that is promising but still needs work to realize its potential. As it stands now in beta mode, even a dataset of 2,000 records puts so much strain on the program that it cannot keep up with the user. It can be used with smaller datasets, but still needs work. It does offer a few interesting additional analysis parameters that NodeXL lacked, such as ability to see Top Tweeters and Top Hashtags, which works better than the graph.

Image of hashtag searchThese graphs have meaning!

Strengths: More data fields, such as the user’s follower and friend count, location, and language (if available), better advanced search (Boolean capabilities, restrict by date or follower count), automated data retrieval

Weaknesses: data visualization tool needs work

Hydrator

Simple interface for Documenting the Now’s Hydrator

tl;dr: A tool used for “re-hydrating” tweet IDs into full tweets, to comply with Twitter’s Terms of Service. Not used for data analysis; useful for retrieving large datasets. Limited to datasets already available.

Documenting the Now, a group focused on collecting and preserving digital content, created the Hydrator tool to comply with Twitter’s Terms of Service. Download and distribution of full tweets to third parties is not allowed, but distribution of tweet IDs is allowed. The organization manages a Tweet Catalog with files that can be downloaded and run through the Hydrator to view the full Tweet. Researchers are also invited to submit their own dataset of Tweet IDs, but this requires use of other software to download them. This tool does not offer any data visualization, but is useful for studying and sharing large datasets (the file for the 115th US Congress contains 1,430,133 tweets!). Researchers are limited to what has already been collected, but multiple organizations provide publicly downloadable tweet ID datasets, such as Harvard’s Dataverse. Note that the rate of hydration is also limited by Twitter’s API, and the Hydrator tool manages that for you. Some of these datasets contain millions of tweet IDs, and will take days to be transformed into full tweets.

Strengths: Provides full tweets for analysis, straightforward interface

Weaknesses: No data analysis tools

Crimson Hexagon

If you’re looking for more robust analytics tools, Crimson Hexagon is a data analytics platform that specializes in social media. Not limited to Twitter, it can retrieve data from Facebook, Instagram, Youtube, and basically any other online source, like blogs or forums. The company has a partnership with Twitter and pays for greater access to their data, giving the researcher higher download limits and a longer time range than they would receive from either NodeXL or TAGS. One can access tweets starting from Twitter’s inception, but these features cost money! The University of Illinois at Urbana-Champaign is one such entity paying for this platform, so researchers affiliated with our university can request access. One of the Scholarly Commons interns, Matt Pitchford, uses this tool in his research on Twitter response to terrorism.

Whether you’re an experienced text analyst or just want to play around, these open source tools are worth considering for different uses, all without you spending a dime.

If you’d like to know more, researcher Rebekah K. Tromble recently gave a lecture at the Data Scientist Training for Librarians (DST4L) conference regarding how different (paid) platforms influence or bias analyses of social media data. As you start a real project analyzing social media, you’ll want to know how the data you have gathered may be limited to adjust your analysis accordingly.

Preparing Your Data for Topic Modeling

In keeping with my series of blog posts on my research project, this post is about how to prepare your data for input into a topic modeling package. I used Twitter data in my project, which is relatively sparse at only 140 characters per tweet, but the principles can be applied to any document or set of documents that you want to analyze.

Topic Models:

Topic models work by identifying and grouping words that co-occur into “topics.” As David Blei writes, Latent Dirichlet allocation (LDA) topic modeling makes two fundamental assumptions: “(1) There are a fixed number of patterns of word use, groups of terms that tend to occur together in documents. Call them topics. (2) Each document in the corpus exhibits the topics to varying degree. For example, suppose two of the topics are politics and film. LDA will represent a book like James E. Combs and Sara T. Combs’ Film Propaganda and American Politics: An Analysis and Filmography as partly about politics and partly about film.”

Topic models do not have any actual semantic knowledge of the words, and so do not “read” the sentence. Instead, topic models use math. The tokens/words that tend to co-occur are statistically likely to be related to one another. However, that also means that the model is susceptible to “noise,” or falsely identifying patterns of cooccurrence if non-important but highly-repeated terms are used. As with most computational methods, “garbage in, garbage out.”

In order to make sure that the topic model is identifying interesting or important patterns instead of noise, I had to accomplish the following pre-processing or “cleaning” steps.

  • First, I removed the punctuation marks, like “,.;:?!”. Without this step, commas started showing up in all of my results. Since they didn’t add to the meaning of the text, they were not necessary to analyze.
  • Second, I removed the stop-words, like “I,” “and,” and “the,” because those words are so common in any English sentence that they tend to be over-represented in the results. Many of my tweets were emotional responses, so many authors wrote in the first person. This tended to skew my results, although you should be careful about what stop words you remove. Simply removing stop-words without checking them first means that you can accidentally filter out important data.
  • Finally, I removed too common words that were uniquely present in my data. For example, many of my tweets were retweets and therefore contained the word “rt.” I also ended up removing mentions to other authors because highly retweeted texts tended to mean that I was getting Twitter user handles as significant words in my results.

Cleaning the Data:

My original data set was 10 Excel files of 10,000 tweets each. In order to clean and standardize all these data points, as well as combining my file into one single document, I used OpenRefine. OpenRefine is a powerful tool, and it makes it easy to work with all your data at once, even if it is a large number of entries. I uploaded all of my datasets, then performed some quick cleaning available under the “Common Transformations” option under the triangle dropdown at the head of each column: I changed everything to lowercase, unescaped HTML characters (to make sure that I didn’t get errors when trying to run it in Python), and removed extra white spaces between words.

OpenRefine also lets you use regular expressions, which is a kind of search tool for finding specific strings of characters inside other text. This allowed me to remove punctuation, hashtags, and author mentions by running a find and replace command.

  • Remove punctuation: grel:value.replace(/(\p{P}(?<!’)(?<!-))/, “”)
    • Any punctuation character is removed.
  • Remove users: grel:value.replace(/(@\S*)/, “”)
    • Any string that begins with an @ is removed. It ends at the space following the word.
  • Remove hashtags: grel:value.replace(/(#\S*)/,””)
    • Any string that begins with a # is removed. It ends at the space following the word.

Regular expressions, commonly abbreviated as “regex,” can take a little getting used to in order to understand how they work. Fortunately, OpenRefine itself has some solid documentation on the subject, and I also found this cheatsheet valuable as I was trying to get it work. If you want to create your own regex search strings, regex101.com has a tool that lets you test your expression before you actually deploy it in OpenRefine.

After downloading the entire data set as a Comma Separated Value (.csv) file, I then used the Natural Language ToolKit (NLTK) for Python to remove stop-words. The code itself can be found here, but I first saved the content of the tweets as a single text file, and then I told NLTK to go over every line of the document and remove words that are in its common stop word dictionary. The output is then saved in another text file, which is ready to be fed into a topic modeling package, such as MALLET.

At the end of all these cleaning steps, my resulting data is essentially composed of unique nouns and verbs, so, for example, @Phoenix_Rises13’s tweet “rt @drlawyercop since sensible, national gun control is a steep climb, how about we just start with orlando? #guncontrolnow” becomes instead “since sensible national gun control steep climb start orlando.” This means that the topic modeling will be more focused on the particular words present in each tweet, rather than commonalities of the English language.

Now my data is cleaned from any additional noise, and it is ready to be input into a topic modeling program.

Interested in working with topic models? There are two Savvy Researcher topic modeling workshops, on December 6 and December 8, that focus on the theory and practice of using topic models to answer questions in the humanities. I hope to see you there!

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 2

Welcome to part two of our two-part series on creating web maps! If you haven’t read part one yet, you can find it here. If you have read part one, we’re going to pick up right where we left off.

Now that we’ve imported our CSV into a web map, we can begin to play around with how the data is represented. You should be brought to the “Change Style” screen after importing your data, which presents you with a drop-down menu and three drawing styles to choose from:

Map Viewer Change Style Screen

Map Viewer Change Style Screen

Hover over each drawing style for more information, and click each one to see how they visualize your data. Don’t worry if you mess up — you can always return to this screen later. We’re going to use “Types (Unique symbols)” for this exercise because it gives us more options to fiddle with, but feel free to dive into the options for each of the other two drawing styles if you like how they represent your data. Click “select” under “Types (Unique symbols)” to apply the style, then select a few different attributes in the “Choose an attribute to show” dropdown menu to see how they each visualize your data. I’m choosing “Country” as my attribute to show simply because it gives us an even distribution of colors, but for your research data you will want to select this attribute carefully. Next, click “Options” on our drawing style and you can play with the color, shape, name, transparency, and visible range for all of your symbols. Click the three-color bar (pictured below) to change visual settings for all of your symbols at once. When you’re happy with the way your symbols look, click OK and then DONE.

Now is also good time to select your basemap, so click “Basemap” on the toolbar and select one of the options provided — I’m using “Light Gray Canvas” in my examples here.

Change all symbols icon

Click the three-color bar to change visual settings for all of your symbols at once

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Now that our data is visualized the way we want, we can do a lot of interesting things depending on what we want to communicate. As an example, let’s pretend that our IP addresses represent online access points for a survey we conducted on incarceration spending in the United States. We can add some visual insight to our data by inserting a layer from the web using “Add → Search for layers” and overlaying a relevant layer. I searched for “inmate spending” and found a tile layer created by someone at the Esri team that shows the ratio of education spending to incarceration spending per state in the US:

"Search for Layers" screen

The “Search for Layers” screen

 

 

 

 

 

 

 

 

 

 

 

 

 

You might notice in the screenshot above that there are a lot of similar search results; I’m picking the “EducationVersusIncarceration” tile layer (circled) because it loads faster than the feature layer. If you want to learn why this happens, check out Esri’s documentation on hosted feature layers.

We can add this layer to our map by clicking “Add” then “Done Adding Layers,” and voilà, our data is enriched! There are many public layers created by Esri and the ArcGIS Online community that you can search through, and even more GIS data hosted elsewhere on the web. You can use the Scholarly Commons geospatial data page if you want to search for public geographic information to supplement your research.

Now that we’re done visualizing our data, it’s time to export it for presentation. There are a few different ways that we can do this: by sharing/embedding a link, printing to a pdf/image file, or creating a presentation. If we want to create a public link so people can access our map online, click “Share” in the toolbar to generate a link (note: you have to check the “Everyone (public)” box for this link to work). If we want to download our map as a pdf or image, click “Print” and then select whether or not we want to include a legend, and we’ll be brought to a printer-friendly page showing the current extent of our map. Creating an ArcGIS Online Presentation is a third option that allows you to create something akin to a PowerPoint, but I won’t get into the details here. Go to Esri’s Creating Presentations help page for more information.

Click to enlarge the GIFs below and see how to export your map as a link and as an image/pdf:

Share web map via public link

Note: you can also embed your map in a webpage by selecting “Embed In Website” in the Share menu.

 

Saving the map as an image/pdf using the "Print" button in the toolbar. Note: if you save your map as an image using "save image as..." you will only save the map, NOT the legend.

Save your map as an image/pdf. NOTE: if you save your map as an image using “save image as…” you can only save the map, NOT the legend.

While there are a lot more tools that we can play with using our free ArcGIS Online accounts – clustering, pop-ups, bookmarks, labels, drawing styles, distance measuring – and even more tools with an organizational account – 25 different built-in analyses, directions, Living Atlas Layers – this is all that we have time for right now. Keep an eye out for future Commons Knowledge blog posts on GIS, and visit our GIS page for even more resources!

Spotlight: Library of Congress Labs

The Library of Congress Labs banner.

It’s always exciting when an organization with as much influence and reach as the Library of Congress decides to do something different. Library of Congress Labs is a new endeavor by the LoC, “a place to encourage innovation with Library of Congress digital collections”. Launched on September 19, 2017, Labs is a place of experimentation,and will host a rotating selection of “experiments, projects, events and resources” as well as blog posts and video presentations.

In this post, I’ll just be faffing around the Labs website, focusing on the “Experiments” portion of the site. (We’ll look at “LC for Robots” in another post.) As of writing (10/3/17), there are three “Experiments” on the site — Beyond Words, Calling All Storytellers, and #AsData Poster Series. Right now, Calling All Storytellers is just asking for people’s ideas for the website, so I’ll briefly go over Beyond Words and #As Data Poster Series and give my thoughts on them.

Beyond Words

Beyond Words is a crowd-sourced transcription system for the LoC’s Chronicling America digitized newspaper collection. Users are invited to mark, transcribe, and verify World War I newspapers. Tasks are split, so the user only does one task at a time. Overall, however, it’s pretty similar to other transcription efforts already on the Internet; though, the tools tend to be better-working, less-clunky, and clearer than some other efforts I’ve seen.

#AsData Poster Series

The #AsData Poster Series is a poster series by artist Oliver Baez Bendorf,  commissioned by the LoC for their Collections as Data Summit in September 2016. the posters are beautiful and artistic, and represent the themes of the summit. One aspect that I like about this page, is that it’s not just the posters themselves, but includes more information, like an interview with the artist. That being said, it does seem like a bit of a placeholder.

While I was excited to explore the experiments, I’m hopeful to see more innovative ideas from the Library of Congress. The Labs “Experiments” have great potential, and it will be interesting to stay tuned and where they go next.

Keep an eye on Commons Knowledge in the next few weeks, when we talk about the “LC for Robots” Labs page!