Written by Heidi Imker, Director of the Library Research Data Service
When you think of data, your mind probably doesn’t jump right to Pride and Prejudice. That is, unless you’re Heidi Imker, Director of the Research Data Service and amateur Jane Austen internet sleuth. “In late 2020,” Heidi says, “I was in desperate need of a post-Outlander spiritual cleanse. Naturally, I turned to Pride and Prejudice. Over a year later, I’m still in the midst of a fantastic, out-of-control Jane Austin binge, and I’ve got oodles of related resources worthy of Love Data Week.”
Join Heidi on a virtual tour of some of her favorite data resources about Austen, her works, and historical England.
In this fabulous R package, data scientist Julia Silge used text data for the Austen novels available from the also fabulous Project Gutenberg. The package offers cleaned data, documentation, and scripts to play with and analyze the novels.
Randomly, sifting through the janeaustenr dataset gave me a new level of appreciation for the word “ignore.” Austen didn’t use “ignore” once in any of her novels. It turns out that no one was really using it because it hadn’t caught on yet. In fact, according to Google’s ngram viewer, “ignore” didn’t start getting traction until circa 1845. And now you might be thinking word frequency data is fun, and it is! Like this word frequencies dataset available from the HathiTrust Research Center.
One of the things I learned during this binge was that dating the events in Pride and Prejudice has been a subject of debate for some time (as in, about a century). I found it downright fascinating that scholars could map parts of the book to the 1811 calendar year and others to the year 1794. I had never really thought about the characters existing in a specific year, but now I wondered what else was happening in those years? I discovered the Waterloo Association, a community of military historians behind the Napoleon Series. This immense archive contains articles on military history, biographies, and documentation of thousands of officers and soldiers (such as Challis’s Peninsula Roll Call).
Provides searchable access to >240,000 digitized pages of archival documents, with special focus on crime, poverty, and social policy. Not only is the source material available, but the people behind London Lives have made it a point to keep humanity at the forefront by constructing biographies of the individuals caught in the crime and poverty cycle in London between 1690 and 1800.
My favorite dataset of all time, it was thoughtfully and painstakingly created by Professor Simon McVeigh at Goldsmiths, University of London over many decades. It lists 4,001 concert events, as found through locating and documenting adverts in archival newspapers—by hand. When Lady Catharine tells Elizabeth that “it will be in my power to take one of you as far as London, for I am going there early in June, for a week,” what could that self-professed music aficionado have heard in June 1794? Voila! Perhaps it was Handel’s Messiah at St Margaret’s Church in Westminster on Thursday, June 5th.
I appreciate the Calendar of London Concerts dataset for my odd little hobby, but I love it as an information professional. The sheer dedication it took assemble the data, especially with such strict attention to detail, is incredible. Let me explicitly gush about the documentation for a moment. Context! References! Abbreviations! All explained! What’s “HM”? His Majesty’s something or other? No, it’s the Half-Moon Tavern in Cheapside. Currency conversions! Syntax for nearly impossible to standardize programme content! It’s forty-four glorious pages! Swoon!
Related resources on London concerts
- McVeigh’s data formatted as an online database
- Article from the creator about curating the database, “Rescuing a Heritage Database: Some Lessons from London Concert Life in the Eighteenth Century”
- Digital exhibit showcasing the data
- Statistics in Historical Musicology series
What started out as a casual, online-friendly hobby ended up introducing me to a wealth of enlightening open data resources, and I’m in love with every one of them. Since my Austen binge is apparently nowhere near over, you may well get another link-laden post for next year’s Love Data Week. <3
Heidi Imker is the Director of the Research Data Service (RDS) and an Associate Professor at the University of Illinois at Urbana-Champaign. The RDS helps researchers across the Urbana-Champaign campus manage and share research data, and in her role as Director, she ensures the RDS takes a collaborative, user-oriented, and practical approach to research support. Heidi holds a Ph.D. in Biochemistry from the University of Illinois and did her postdoctoral research at the Harvard Medical School.