Going Down the Jane Austen Rabbit Hole

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Heidi Imker, Director of the Library Research Data Service

When you think of data, your mind probably doesn’t jump right to Pride and Prejudice. That is, unless you’re Heidi Imker, Director of the Research Data Service and amateur Jane Austen internet sleuth. “In late 2020,” Heidi says, “I was in desperate need of a post-Outlander spiritual cleanse. Naturally, I turned to Pride and Prejudice. Over a year later, I’m still in the midst of a fantastic, out-of-control Jane Austin binge, and I’ve got oodles of related resources worthy of Love Data Week.”

Join Heidi on a virtual tour of some of her favorite data resources about Austen, her works, and historical England.

  1. janeaustenr: Jane Austen’s Complete Novels

In this fabulous R package, data scientist Julia Silge used text data for the Austen novels available from the also fabulous Project Gutenberg. The package offers cleaned data, documentation, and scripts to play with and analyze the novels.

  1. Word Frequencies in English-Language Literature, 1700-1922

Randomly, sifting through the janeaustenr dataset gave me a new level of appreciation for the word “ignore.” Austen didn’t use “ignore” once in any of her novels. It turns out that no one was really using it because it hadn’t caught on yet. In fact, according to Google’s ngram viewer, “ignore” didn’t start getting traction until circa 1845. And now you might be thinking word frequency data is fun, and it is! Like this word frequencies dataset available from the HathiTrust Research Center.

  1. Napoleon Series

One of the things I learned during this binge was that dating the events in Pride and Prejudice has been a subject of debate for some time (as in, about a century). I found it downright fascinating that scholars could map parts of the book to the 1811 calendar year and others to the year 1794. I had never really thought about the characters existing in a specific year, but now I wondered what else was happening in those years? I discovered the Waterloo Association, a community of military historians behind the Napoleon Series. This immense archive contains articles on military history, biographies, and documentation of thousands of officers and soldiers (such as Challis’s Peninsula Roll Call).

  1. London Lives

Provides searchable access to >240,000 digitized pages of archival documents, with special focus on crime, poverty, and social policy. Not only is the source material available, but the people behind London Lives have made it a point to keep humanity at the forefront by constructing biographies of the individuals caught in the crime and poverty cycle in London between 1690 and 1800.

  1. Calendar of London Concerts 1750-1800

My favorite dataset of all time, it was thoughtfully and painstakingly created by Professor Simon McVeigh at Goldsmiths, University of London over many decades. It lists 4,001 concert events, as found through locating and documenting adverts in archival newspapers—by hand. When Lady Catharine tells Elizabeth that “it will be in my power to take one of you as far as London, for I am going there early in June, for a week,” what could that self-professed music aficionado have heard in June 1794? Voila! Perhaps it was Handel’s Messiah at St Margaret’s Church in Westminster on Thursday, June 5th.

I appreciate the Calendar of London Concerts dataset for my odd little hobby, but I love it as an information professional. The sheer dedication it took assemble the data, especially with such strict attention to detail, is incredible. Let me explicitly gush about the documentation for a moment. Context! References! Abbreviations! All explained! What’s “HM”? His Majesty’s something or other? No, it’s the Half-Moon Tavern in Cheapside. Currency conversions! Syntax for nearly impossible to standardize programme content! It’s forty-four glorious pages! Swoon!

Related resources on London concerts

What started out as a casual, online-friendly hobby ended up introducing me to a wealth of enlightening open data resources, and I’m in love with every one of them. Since my Austen binge is apparently nowhere near over, you may well get another link-laden post for next year’s Love Data Week. <3

Headshot of HeidiHeidi Imker is the Director of the Research Data Service (RDS) and an Associate Professor at the University of Illinois at Urbana-Champaign. The RDS helps researchers across the Urbana-Champaign campus manage and share research data, and in her role as Director, she ensures the RDS takes a collaborative, user-oriented, and practical approach to research support. Heidi holds a Ph.D. in Biochemistry from the University of Illinois and did her postdoctoral research at the Harvard Medical School.

OpenRefine: a Cinderella Story but for Data

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Dena Strong

Ever wish you could call on a fairy godmother who could wave a magic wand and make all your data problems disappear? Luckily for us at the University of Illinois, we can call on Senior Information Design Specialist Dena Strong. Dena can solve data problems so fast it seems downright magical. For Love Data Week 2022, check out Dena’s story about OpenRefine, the data tool she loves beyond all reason:

“I once had a consultation with a person who presented me with two Excel files and a data cleaning dilemma that he estimated was going to take him 200 hours of manual labor to repair. It took me 15 minutes of conversation to understand what he needed to do with the files to get them clean and integrated – and then it took me 5 minutes in OpenRefine to do the data cleaning and teach him how to do the same so he could do it again whenever he wanted. The other 199.6 hours of his time went to more productive uses. He and I have both been OpenRefine cheerleaders ever since. When I did a Caffeine Break session about it, an attendee said it was the most useful 45 minutes of training he’d ever had.”

As of the time of this writing, none of Dena’s datasets have turned back into pumpkins.

Headshot of DenaDena Strong (MLIS) is a member of the Web Hosting team at Technology Services; she also serves as a liaison with the Research Data Service at the Library. With 20 years of experience in usability, accessibility, information architecture, and workflows, Dena enjoys collaborating and consulting with people across campus. She’s also been spotted studying six languages, reproducing Heian-era Japanese dye techniques, and occasionally burning Kool-aid in search of new fabric colors.

When did you first fall in love with data?

This post is part of a series for Love Data Week, which takes place February 14-18 2022.

Written by Lauren Phegley

Picture it – North Central College, Illinois, 2018. Twenty-one-year-old sociology major Laurent Phegley takes her seat in Professor Corsino’s class with no idea that she’s about to fall in love…with data. At the time, Dr. Corsino studied occupational attainment of Italian immigrants in Chicago Heights during the 1900’s. Lauren and her classmates sifted through census data to piece together the career tracks of (mostly male) Italian Americans. These data weren’t just checkmarks on a form. They were glimpses into entire families, glimpses that when pieced together told a story about how the American dream operates on the basis of social class. “For me, tracking the individuals through the census was a large puzzle,” Lauren says. Since then, Lauren has focused on helping other researchers solve their data puzzles. “Social science students are often not taught about data management because they don’t see their research as relating to ‘data’. I make a concerted effort now in my work and teaching to target fields that are often forgot about in terms of data management. Research is a labor of love. It is well worth a few hours of time to make sure that your data stays useable and understandable!”

Headshot of LaurenLauren Phegley is a graduate assistant for the Library Research Data Service pursuing her Masters of Science in Library and Information Science at the University of Illinois iSchool. Once she graduates in May 2022, she hopes to work as an academic librarian helping researchers manage their data and research.

Welcome to the new Scholarly Commons!

The staff at the Scholarly Commons are excited to welcome you to our new location in Room 220 of the Main Library! Over the course of the past year of remote work, we have been making progress on getting 220 ready for patron use by the start of the Fall semester and officially opened our new space on August 9th.

Study tables arranged in two rows with students studying.

Study tables located in Room 220

The new Scholarly Commons in Room 220 is a much bigger space that can accommodate more patrons, support individual and group study, host research consultations, and more. We have brand new soft furniture that patrons can lounge in, as well as several study tables that come with screen-casting monitors for easy collaboration.

Individual study pod with clear glass doors

Rooms available to reserve for individual or group study

Our patrons have also been excited about the group collaboration rooms, which are brand new to the Scholarly Commons. These rooms can be reserved for individual study or group meetings. They are glass-enclosed spaces with adjustable lighting, a monitor for screen-casting, and air conditioning. The pods can be reserved for two hours at a time through the library’s online reservation portal.

The Scholarly Commons mission of supporting the advanced research needs of the University of Illinois at Urbana-Champaign community continues in our new space, where we have 14 desktop computers equipped with specialized research software. A full list of software available in Room 220 is available on the Scholarly Commons website. You can also receive statistical consulting services through the Center for Innovation in Teaching and Learning in Room 220 during their drop-in hours. Our scanning equipment is also now located in Room 220, including our new KIC Bookeye Book Scanning Station.

Bookeye scanner with touchscreen and two flatbed scanners

Bookeye and Flatbed scanners

The Scholarly Commons service desk is also now located in Room 220 and is the best way to get immediate help from one of our staff members. We will also be available via our online chat and through email (sc@library.illinois.edu) during our oprerating hours, Monday-Thursday, 10am-4pm and Fridays 10am-noon. You can also visit Room 220 outside of these hours – the room will be available for use whenever the Main Library building is open.

We are so excited to be back on campus and in our new space. We look forward to seeing you at the new Scholarly Commons!

Automated Live Captions for Virtual and In-Person Meetings

At the University of Illinois at Urbana-Champaign, we are starting to think about what life will look like with a return of in-person services, meetings, and events. Many of us are considering what lessons we want to keep from our time conducting these activities online to make the return to in-person as inclusive as possible.

Main library reading room

“Mainlibraryreadingroom.jpg.” C. E. Crane, licensed under a CC-BY 2.0 Attribution license.

One way to make your meetings and presentations accessible is the use of live, automated captions. Captions benefit those who are hard-of-hearing, those who prefer to read the captions while listening to help focus, people whose first language is not English, and others. Over the course of the last year, several online platforms have introduced or enhanced features that create live captions for both virtual and in-person meetings.

Live Captions for Virtual Meetings and Presentations

Most of the major virtual meeting platforms have implemented automated live captioning services.

Zoom

Zoom gives you the option using either live, automated captions or assigning someone to create manual captions. Zoom’s live transcriptions only support US English and can be affected by background noise, so they recommend using manual captioner to ensure you are meeting accessibility guidelines. You can also integrate a third-party captioning software if you prefer.

Microsoft Teams

MS Teams offers live captions in US English and includes some features that allow captions to be attributed to individual speakers. Their live captioning service automatically filters out profane language and is available on the mobile app.

Google Meet

Unlike Zoom and Teams, Google Meet offers live captions in French, German, Portuguese, and Spanish (both Latin America and Spain). This feature is also available on the Google Meet app for Android, iPhone, and iPad.

Slack

Slack currently does not offer live automated captions during meetings.

Icon of laptop open with four people in different qudrants representing an online meeting

“Meeting” by Nawicon from the Noun Project.

Live Captions for In-Person Presentations

After our meetings and presentations return to in-person, we can still incorporate live captions whenever possible to make our meetings more accessible. This works best when a single speaker is presenting to a group.

PowerPoint

PowerPoint’s live captioning feature allows your live presentation to be automatically transcribed and displayed on your presentation slides. The captions can be displayed in either the speaker’s native language or translated into other languages. Presenters can also adjust how the captions display on the screen.

Google Slides

The captioning feature in Google slides is limited to US English and works best with a single speaker. Captions can be turned on during the presentation but do now allow for the presenter to customize their appearance.

Icon of four figures around a table in front of a blamk presentation screen

“Meeting”. by IconforYou from the Noun Project.

As we return to some degree of normalcy, we can push ourselves to imagine creative ways to take the benefits of online gathering with us into the future. The inclusive practice we have adopted don’t need to just disappear, especially as technology and our ways of working continue to adapt.

Data Feminism and Data Justice

“Data” can seem like an abstract term – What counts as data? Who decides what is counted? How is data created? What is it used for?

Outline of a figure surrounded by a pie chart, speach bubble, book, bar chart, and venn diagram to represent different types of data

“Data”. Olena Panasovska. Licensed under a CC BY license. https://thenounproject.com/search/?q=data&i=3819883

These questions are some of the ones you might ask when applying a Data Feminist framework to you research. Data Feminism goes beyond looking at the mechanics and logistics of data collection and analysis to undercover the influences of structural power and erasure in the collection, analysis, and application of data.

Data Feminism was developed by Catherine D’Ignazio and Lauren Kline, authors of the book Data Feminism. Their ideas are grounded in the work of Kimberle Crenshaw, the legal scholar credited with developing the concept of intersectionality. Using this lens, they seek to undercover the ways data science has caused harm to marginalized communities and the ways data justice can be used to remedy those harms in partnership with the communities we aim to help.

The Seven Principles of Data Feminism include:

  • Examine power
  • Challenge power
  • Rethink binaries and hierarchies
  • Elevate emotion and embodiment
  • Embrace pluralism
  • Consider context
  • Make labor visible

Applying data feminist principles to your research might involve working with local communities to co-create consent forms, using data collection to fill gaps in available data about marginalized groups, prioritizing the use of open source, community-created tools, and properly acknowledging and compensating people involved in all stages of the research process. At the heart of this work is the questioning of whose interests drive research and how we can reorient those interests around social justice, equity, and community.

The Feminist Data Manifest-No, authored in part by Anita Say Chan, Associate Professor in the School of Information Sciences and the College of Media, provides additional principles to commit to in data feminist research. These resources, and the scholars and communities engaged in this work, demonstrate how data and research can be used to advance justice, reject neutrality, and prioritize those who have historically experienced the greatest harm at the hands of researchers.

The Data + Feminism Lab at the Massachusetts Institute of Technology, directed by D’Ignazio, is a research organization that “uses data and computational methods to work towards gender and racial equity, particularly as they relate to space and place”. They are members of the Design Justice Network, which seeks to bring together people interested in research that centers marginalized people and aims to address the ways research and data are used to cause harm. These groups provide examples for how to engage in data feminist and data-justice inspired research and action.

Learning how to use tools like SPSS and NVivo is an important aspect of data-related research, but thinking about the seven principles of Data Feminism can inspire us to think critically about our work and engage more fully in our communities.  For more information about data feminism, check out these resources:

Happy Open Education Week 2021!

Every March, librarians around the world celebrate Open Education Week, a time to raise awareness of the need for and use of Open Educational Resources on our campuses. Many libraries are engaged in promoting these resources to faculty and administrators in order to help reduce the cost of course materials for students.

OEWeek 2021 Logo

“Open Education Week Logo.” OEWeek. https://www.openeducationweek.org/page/materials. Licensed under a CC-BY 4.0 license.

Open Educational Resources are learning materials that are published without copyright restrictions, meaning they can be freely distributed, reused, and modified. Faculty who assign Open Educational Resources in their classes help eliminate the barriers to academic success students can face when they cannot afford their course materials. The Florida Virtual Campus survey has demonstrated over several iterations of their survey how these costs negatively impact students – whether it’s dropping or failing a course, changing major, or struggling academically.

OpenStax is one of the most well-known publishers of OER and is often used by librarians as an example of high-quality, low-cost textbooks. While librarians often work as OER advocates on their campus, we are not always the ones publishing our own, original OER. This makes the publishing of Instruction in Libraries and Information Centers: An Introduction in July 2020 a unique and exciting accomplishment that will benefit Library and Information Science students for years to come.

Front cover of Instruction in Libraries by Saunder and Wong

This textbook, authored by Laura Saunders, Associate Professor of Library and Information Science at Simmons College and Melissa Wong, Adjunct Lecturer of Library and Information Sciences at UIUC, is freely available for students to read online, download, and print. The book is the first open access textbook to be published by Windsor and Downs press through IOPN, the University Library’s publishing unit. Other open access books available through the press include Sara Benson’s The Sweet Public Domain: Celebrating Copyright Expiration with the Honey Bunch Series.

Interested in the ways libraries are celebrating these accomplishments and bringing attention to the need to continue our advocacy? Check out the Twitter hashtag #OEWeek to join the conversation.

Thinking Beyond the Four Factors

Every year, libraries and other information professionals recognize Fair Use Week, a week dedicated to educating our communities about the power of Fair Use to help them make informed and responsible decisions about their use of copyrighted materials.

Fair Use week in white text on black background

For example, the University Library at the University of Illinois will be sponsoring a Fair Use Week Game Show, hosted by Copyright Librarian Sara Benson. This event will teach participants about how to conduct a Fair Use analysis in a fun and engaging manner in hopes of getting our campus excited about the possibilities that Fair Use opens.

When considering whether your use of a copyrighted work is a Fair Use, there are 4 main factors to consider: Purpose, Nature, Amount, and Effect.

Purpose refers to your intended use of a work and specifically considers whether you are using it for educational purposes, which is more likely to be considered a fair use, or for profit, which weighs against Fair Use. Nature refers to the work itself. Factual and published works are more likely to be considered a Fair Use than creative or unpublished works.

Amount considers how much of the work you intend to use. Using a small or less important portion of the work is more likely to be a Fair Use, while using the whole work or the “heart” of the work is less likely to be a Fair Use. Lastly, Effect looks at the potential market impact of your use of the work. If it is likely your use would impact the original creator’s ability to profit off their work, your use is less likely to be considered a Fair Use.

In order to make a Fair Use determination, courts weigh each of the four factors holistically to decide whether your use of a copyrighted work is allowed. However, could there be more to a fair use than the four factors used by the courts?

Graphic image of balace scales

“File:Johnny-automatic-scales-of-justice.svg” by johnny_automatic is marked with CC0 1.0

Using another person’s copyrighted material may not just be a legal question, but an ethical one. For example, many libraries make cultural artifacts taken from indigenous people available to the world. As these items get digitized, libraries are typically the copyright owners for the digital version. While doing your Fair Use analysis, it may be worthwhile to also consider whether the community these items were taken from would approve of your use of the material, even if a court would rule that your use is fair.

Another example is the use of personal photos, which the internet makes readily available online. While your use of these photos may be considered a Fair Use after weighing the four factors, is it ethical to include images of other people’s faces in your work without their permission?

Fair Use gives us guidance about how to avoid being sued for copyright infringement and arguments to defend ourselves if we do. But, Fair Use may not always be enough to tell you whether your use is ethical. When in doubt, you can ask your local librarian for tips and resources on using someone else’s copyrighted materials ethically and responsibly.

In the meantime, you can check out the Fair Use page on our Copyright Reference Guide, which contains several resources to help you think through your own Fair Use analysis. Happy Fair Use week!

Open with Purpose: Open Access Week 2020

International Open Access Week 2020 is upon us, and the need for equitable access to research has taken on a new sense of urgency. Every year, libraries celebrate Open Access week to bring attention to issues related to scholarly communications. The theme, “Open with Purpose: Taking Action to Build Structural Equity and Inclusion” is intended to get us thinking about the ways our current systems marginalize and exclude.

Banner for Open Access week. Blue background with white text that says "open with purpose: taking action to build structural equity and inclusion"

This year, we celebrate amidst a pandemic that has completely changed how we do things. Usually, immediate access to scholarly research isn’t on many people’s minds. But, research about COVID-19 has made clear the importance of open access to research. This urgency has caused several publishers to open up their content related to COVID-19 and may be accelerating the shift towards open access as the default for scholarly publishing.

Making research about COVID-19 openly available speeds up the research process by allowing more people to access the data they need to find a solution to this crisis. The CDC, UNESCO, and National Institute for Health have all compiled open access information about COVID-19 for research and educational use to assist in this effort.

However, making research available for free is not enough. In her blog post “Opening up the Margins”, April Hathcock writes, “there are so many ways in which open access still reflects the biased systems of the scholarship in which it’s found, even as it can be used to open up scholarship at the margins” (Hathcock, 2016). Open access is still exclusionary if it maintains practices that privilege the publication of white, western, academic voices and centers those perspectives.

open access logo. orange open padlock

It is no secret that COVID-19 disproportionately affects African-Americans. A quick search of “COVID-19 and African-Americans” in Google Scholar reveals tons of studies demonstrating that fact. While the pandemic has made visible the need to address social inequalities that lead to higher vulnerability in black populations, these problems are not new and the solutions cannot be found under a microscope. The people living in these areas are not the ones conducting research, and yet their perspective is invaluable to knowing how the lived experiences of oppression contribute to this tragedy.

Researchers should not treat people as objects of study but as full people whose susceptibility to the disease cannot simply be linked to genetics. To address the pandemic, we must center the experiences of those most vulnerable. With open access advocacy, we must make sure to include voices that aren’t traditionally acknowledged as scholarly and recognize how those experiences inform the research process.

“Open with Purpose” means mindfully and intentionally creating systems that invite people in. The COVID-19 pandemic has highlighted the urgency of this movement, but the social, economic, and political viruses of racism, sexism, classism, etc. had already made this urgency visible to those who are the most marginalized. Open systems need to not only unlock research, but also to question the very structures that keep it closed to certain people in the first place and rebuild them into something better that can more fully address the world’s problems.

U of I System Weighs in on Sovereign Immunity

In June 2020, the United States Copyright Office put out a request for public input on issues related to states’ liability in cases of copyright infringement. This topic was brought to public attention in March during Alan v. Cooper, where the Supreme Court found it unconstitutional to repeal state’s sovereign immunity in cases of copyright infringement since there was not enough evidence to justify this action. This means that creators whose copyright is violated by the state do not have clear next steps for how to proceed with litigation.

To determine how to move forward, the U.S. Copyright Office was asked to study the extent to which states violate copyright, whether there is a remedy for the creator, and whether the violation is a result of intentional or reckless behavior. The study will inform the decision to repeal this immunity enjoyed by states, which would certainly have consequences for institutions like universities and libraries.

Reading room of the Main Library at the University of Illinois. Large room with tall white ceiling, large windows, light fixtures, and wooden tables and chairs.

“Main Library”. wabisabi2015. Licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 license. https://flic.kr/p/238q6et

As a state-funded, land-grant institution, the University of Illinois system is a major stakeholder in this conversation. The University system both consumes and creates a huge amount of copyrighted material and has a responsibility for making sure our community is following copyright law. We also need to make sure we have the freedom to use and share copyrighted materials to help foster the scholarly and educational mission of the institution.

So, Sara Benson, Copyright Librarian and interim head of the Scholarly Commons, and Scott Rice, Deputy University Counsel, submitted their own response to the United States Copyright Office on behalf of the University of Illinois system. They are currently awaiting a response, which is due by October 22, 2020. In this document, Sara describes some of the ways she educates our community on issues of copyright in her role at the library to help us all contribute to a culture of copyright awareness. This is because the responsibility for following copyright law primarily falls to individual people to make the right choices.

And, for the most part, we do! Sara and Scott say that the University system only experiences 3-6 copyright infringements a year, and that these infringements are not the result of intentional or reckless behavior. The University of Illinois community makes a good-faith effort not to infringe copyright, and will continue to be diligent in face of potential legislation that might increase our liability for copyright violations.

Maintaining our ability to use copyrighted materials in our teaching and research is a group effort. So what can you do to be a good copyright actor? Here are a few tips to get you started:

  • Cite your sources! Including attribution shows a good-faith effort to credit the original creator. While this doesn’t necessarily protect you from claims of infringement, it is helpful for showing that the work wasn’t used maliciously.
  • Learn about Fair Use! Fair Use is a great way to think through whether your use of copyrighted materials is permissible. But, keep in mind that only a lawyer can give you advice on whether your use is a fair use.
  • Ask for help! When in doubt, asking for a second opinion is a good way to avoid copyright infringement. Email Sara Benson at srbenson@illinois.edu with your copyright questions (please note that Sara cannot provide legal counsel).

Check out the library’s Copyright Reference Guide for even more tips on how to be a good copyright-actor!