How We’re Celebrating the Sweet Public Domain

This is a guest blog by the amazing Kaylen Dwyer, a GA in Scholarly and Communication Publishing

Collage of the Honey Bunch series

As William Tringali mentioned last week, 2019 marks an exciting shift in copyright law with hundreds of thousands of works entering the public domain every January 1st for the next eighteen years. We are setting our clocks back to the year of 1923—to the birth of the Harlem Renaissance with magazines like The Crisis, to first-wave feminists like Edith Wharton, Virginia Woolf, and Dorothy L. Sayers, back to the inter-war period.

Copyright librarian Sara Benson has been laying the groundwork to bring in the New Year and celebrate the wealth of knowledge now publicly available for quite some time, leading up to a digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, and the Re-Mix It! Competition to be held this spring.

A collaborative effort between Benson, graduate assistants, and several scholarly contributors, The Sweet Public Domain celebrates creative reuse and copyright law. Last year, GA Paige Kuester spent time scouring the Rare Book and Manuscript Library in search of something that had never been digitized before, something at risk of being forgotten forever, not because it is unworthy of attention, but because it has been captive to copyright for so long.

We found just the thing—the beloved Honey Bunch series, a best-selling girls’ series by the Stratemeyer Syndicate. The syndicate become known for its publication of Nancy Drew, the Hardy Boys, the Bobbsey Twins, and many others, but in 1923 they kicked off the adventures of Honey Bunch with Just a Little Girl, Her First Visit to the City, and Her First Days on the Farm.

Through the digital exhibit, The Sweet Public Domain: Honey Bunch and Copyright, you can explore all three books, introduced by Deidre Johnson (Edward Stratemeyer and the Stratemeyer Syndicate, 1993) and LuElla D’Amico (Girls Series Fiction and American Popular Culture, 2017). To hear more about copyright and creative reuse, you can find essays by Sara Benson, our copyright librarian, and Kirby Ferguson, filmmaker and producer of Everything is a Remix.

If you are a student at the University of Illinois at Urbana-Champaign, you can engage with the public domain by making new and innovative work out of something old and win up to $500 for your creation. Check out the Re-Mix It! Competition page for contest details and be sure to check out our physical exhibit in the Marshall Gallery (Main Library, first floor east entrance) for ideas.

Logo for the Remix It competition

A Beautiful Year for Copyright!

Hello, researchers! And welcome to the bright, bold world of 2019! All around the United States, Copyright Librarians are rejoicing this amazing year! But why, might you ask?

Cover page of "Leaves From A Grass House" from Don Landing

Cover page of “Leaves From A Grass House” from Don Landing

Well, after 20 years, formally published works are entering the public domain. That’s right, the amazing, creative works of 1923 will belong to the public as a whole.

Though fascinating works like Virginia Woolf’s Jacob’s Room are just entering the public domain Some works entered the public domain years ago. The holiday classic “It’s a Wonderful Life”, entered the public domain because, according to Duke Law School’s Center for the Study of the Public Domain (2019), its copyright was not renewed after its “first 28 year term” (Paragraph 13). Though, in a fascinating turn of events, the original copyright holder “reasserted copyright based on its ownership of the film’s musical score and the short story on which the film was based” after the film became such a success. (Duke Law School’s Center for the Study of the Public Domain, 2019, Paragraph 13).

An image of a portion of Robert Frost's poem "New Hampshire"

An image of a portion of Robert Frost’s poem “New Hampshire”

But again, why all the fuss? Don’t items enter the public domain ever year?

That answer is, shockingly, no! Though 1922 classics like Nosferatu entered the public domain in 1998, 1923’s crop of public domain works are only entering this year, making this the first time in 20 years a massive crop of works have become public, according to Verge writer Jon Porter (2018). This was the year lawmakers “extended the length of copyright from 75 years to 95, or from 50 to 70 years after the author’s death” (Porter, 2018, Paragraph 2).

Table of contents for "Tarzan and the Golden Lion"

Table of contents for “Tarzan and the Golden Lion”

What’s most tragic about this long wait time for the release of these works is that, after almost 100 years, so many of them are lost. Film has decayed, text has vanished, and music has stopped being played. We cannot know the amount of creative works lost to time, but here are a few places that can help you find public domain works from 1923!

Duke Law School’s Center for the Study of the Public Domain has an awesome blog post with even more information about copyright law and the works now available to the public.

If you want to know what’s included in this mass public domain-ifying of so many amazing creative works book-wise, you can check out HathiTrust has released more than 53,000 readable online, for free!

Screenshot of the HathiTrust search page for items published in the year 1923.

Screenshot of the HathiTrust search page for items published in the year 1923.

Finally, the Public Domain Review has a great list of links to works now available!

Sources:

Duke Law School’s Center for the Study of the Public Domain. (2019, Jan. 1). Public Domain Day 2019. Retrieved from https://law.duke.edu/cspd/publicdomainday/2019/

Porter, Jon. (2018, December 31). After a 20 year delay, works from 1923 will finally enter the public domain tomorrow. The Verge. Retrieved from https://www.theverge.com/2018/12/31/18162933/public-domain-day-2019-the-pilgrim-jacobs-room-charleston-copyright-expiration

Cool Text Data – Music, Law, and News!

Computational text analysis can be done in virtually any field, from biology to literature. You may use topic modeling to determine which areas are the most heavily researched in your field, or attempt to determine the author of an orphan work. Where can you find text to analyze? So many places! Read on for sources to find unique text content.

Woman with microphone

Genius – the song lyrics database

Genius started as Rap Genius, a site where rap fans could gather to annotate and analyze rap lyrics. It expanded to include other genres in 2014, and now manages a massive database covering Ariana Grande to Fleetwood Mac, and includes both lyrics and fan-submitted annotations. All of this text can be downloaded and analyzed using the Genius API. Using Genius and a text mining method, you could see how themes present in popular music changed over recent years, or understand a particular artist’s creative process.

homepage of case.law, with Ohio highlighted, 147,692 unique cases. 31 reporters. 713,568 pages scanned.

Homepage of case.law

Case.law – the case law database

The Caselaw Access Project (CAP) is a fairly recent project that is still ongoing, and publishes machine-readable text digitized from over 40,000 bound volumes of case law from the Harvard Law School Library. The earliest case is from 1658, with the most recent cases from June 2018. An API and bulk data downloads make it easy to get this text data. What can you do with huge amounts of case law? Well, for starters, you can generate a unique case law limerick:

Wheeler, and Martin McCoy.
Plaintiff moved to Illinois.
A drug represents.
Pretrial events.
Rocky was just the decoy.

Check out the rest of their gallery for more project ideas.

Newspapers and More

There are many places you can get text from digitized newspapers, both recent and historical. Some newspaper are hundreds of years old, so there can be problems with the OCR (Optical Character Recognition) that will make it difficult to get accurate results from your text analysis. Making newspaper text machine readable requires special attention, since they are printed on thin paper and have possibly been stacked up in a dusty closet for 60 years! See OCR considerations here, but the newspaper text described here is already machine-readable and ready for text mining. However, with any text mining project, you must pay close attention to the quality of your text.

The Chronicling America project sponsored by the Library of Congress contains digital copies of newspapers with machine-readable text from all over the United States and its territories, from 1690 to today. Using newspaper text data, you can analyze how topics discussed in newspapers change over time, among other things.

newspapers being printed quickly on a rolling press

Looking for newspapers from a different region? The library has contracts with several vendors to conduct text mining, including Gale and ProQuest. Both provide newspaper text suitable for text mining, from The Daily Mail of London (Gale), to the Chinese Newspapers Collection (ProQuest). The way you access the text data itself will differ between the two vendors, and the library will certainly help you navigate the collections. See the Finding Text Data library guide for more information.

The sources mentioned above are just highlights of our text data collection! The Illinois community has access to a huge amount of text, including newspapers and primary sources, but also research articles and books! Check out the Finding Text Data library guide for a more complete list of sources. And, when you’re ready to start your text mining project, contact the Scholarly Commons (sc@library.illinois.edu), and let us help you get started!

Wikidata and Wikidata Human Gender Indicators (WHGI)

Wikipedia is a central player in online knowledge production and sharing. Since its founding in 2001, Wikipedia has been committed to open access and open editing, which has made it the most popular reference work on the web. Though students are still warned away from using Wikipedia as a source in their scholarship, it presents well-researched information in an accessible and ostensibly democratic way.

Most people know Wikipedia from its high ranking in most internet searches and tend to use it for its encyclopedic value. The Wikimedia Foundation—which runs Wikipedia—has several other projects which seek to provide free access to knowledge. Among those are Wikimedia Commons, which offers free photos; Wikiversity, which offers free educational materials; and Wikidata, which provides structured data to support the other wikis.

The Wikidata logo

Wikidata provides structured data to support Wikimedia and other Wikimedia Foundation projects

Wikidata is a great tool to study how Wikipedia is structured and what information is available through the online encyclopedia. Since it is presented as structured data, it can be analyze quantitatively more easily than Wikipedia articles. This has led to many projects that allow users to explore data through visualizations, queries, and other means. Wikidata offers a page of Tools that can be used to analyze Wikidata more quickly and efficiently, as well as Data Access instructions for how to use data from the site.

The webpage for the Wikidata Human Gender Indicators project

The home page for the Wikidata Human Gender Indicators project

An example of a project born out of Wikidata is the Wikidata Human Gender Indicators (WHGI) project. The project uses metadata from Wikidata entries about people to analyze trends in gender disparity over time and across cultures. The project presents the raw data for download, as well as charts and an article written about the discoveries the researchers made while compiling the data. Some of the visualizations they present are confusing (perhaps they could benefit from reading our Lightning Review of Data Visualization for Success), but they succeed in conveying important trends that reveal a bias toward articles about men, as well as an interesting phenomenon surrounding celebrities. Some regions will have a better ratio of women to men biographies due to many articles being written about actresses and female musicians, which reflects cultural differences surrounding fame and gender.

Of course, like many data sources, Wikidata is not perfect. The creators of the WHGI project frequently discovered that articles did not have complete metadata related to gender or nationality, which greatly influenced their ability to analyze the trends present on Wikipedia related to those areas. Since Wikipedia and Wikidata are open to editing by anyone and are governed by practices that the community has agreed upon, it is important for Wikipedians to consider including more metadata in their articles so that researchers can use that data in new and exciting ways.

An animated gif of the Wikipedia logo bouncing like a ball

HathiTrust Research Center Expands Text Mining Corpus

Good news for text and data mining researchers! After years of court cases and policymaking, the entire 16-million-item collection of the HathiTrust Digital Library, including content in-copyright, is available for text and data mining. (Yay!)

Previously, only non-copyrighted, public domain materials were able to be used with HTRC Analytics’ suite of tools. The restriction obviously limited ability to do quality computational research on modern history; most out-of-copyright items are texts created before 1923. With this update, everyone can perform text analysis on the full corpus with different tools. HathiTrust is membership-based, so some restrictions apply to non-member institutions and independent scholars alike (Illinois is a member institution). With the passage of this new policy, only one service, the HTRC Data Capsule (a virtual computing environment), retains members-only access to the full corpus for requesters with an established research need. There are over 140 member institutions, including University of Illinois.

Here’s a quick overview of HTRC’s tools and access permissions (from HTRC’s Documentation).

  • HTRC Algorithms: a set of tools for assembling collections of digitized text from the HathiTrust corpus and performing text analysis on them. Including copyrighted items for ALL USERS.
  • Extracted Features Dataset: dataset allowing non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HathiTrust+Bookworm: a tool for visualizing and analyzing word usage trends in the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HTRC Data Capsule: a secure computing environment for researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items is available ONLY to member-affiliated researchers.

Fair Use to the Rescue!

How is this possible? Through both the Fair Use section of the Copyright Act and HathiTrust’s policy of allowing only non-consumptive research. Fair Use protects use of copyrighted materials for educational, research, and transformative purposes. Non-consumptive research means that researchers can glean information about works without actually being able to read (consume) them. You can see the end result (topic models, word and phrase statistics, etc.), without seeing the entirety of the work for human reading. Allowing computational research only on a corpus protects rights holders, and benefits researchers. A researcher can perform text analysis on thousands of texts without reading them all, which is the basis of computational text analysis anyway! Our Copyright Librarian, Sara Benson, recently discussed how Fair Use factors into HathiTrust’s definition of non-consumptive research.

Ready to use HTRC Analytics for text mining? Check out their Getting Started with HTRC Guide for some simple, guided start-up activities.

For general information about the digital library, see our guide on HathiTrust.

Analyze and Visualize Your Humanities Data with Palladio

How do you make sense of hundreds of years of handwritten scholarly correspondence? Humanists at Stanford University had the same question, and developed the project Mapping the Republic of Letters to answer it. The project maps scholarly social networks in a time when exchanging ideas meant waiting months for a letter to arrive from across the Atlantic, not mere seconds for a tweet to show up in your feed. The tools used in this project inspired the Humanities + Design lab at Stanford University to create a set of free tools specifically designed for historical data, which can be multi-dimensional and not suitable for analysis with statistical software. Enter Palladio!

To start mapping connections in Palladio, you first need some structured, tabular data. An Excel spreadsheet in CSV format with data that is categorized and sorted is sufficient. Once you have your data, just upload it and get analyzing. Palladio likes data about two types of things: people and places. The sample data Palladio provides is information about influential people who visited or were otherwise connected with the itty bitty country of Monaco. Read on for some cool things you can do with historical data.

Mapping

Use the Map feature to mark coordinates and connections between them. Using the sample data that HD Lab provided, I created the map below, which shows birthplaces and arrival points. Hovering over the connection shows you the direction of the move. By default, you can change the map itself to be standard maps like satellite or terrain, or even just land masses with no human-created geography, like roads or place names.

Map of Mediterranean sea and surrounding lands of Europe, red lines across map show movement, all end in Monaco

One person in our dataset was born in Galicia, and later arrived in Monaco.

But, what if you want to combine this new-fangled spatial analysis with something actually historic? You’re in luck! Palladio allows you to use other maps as bases, provided that the map has been georeferenced (assigned coordinates based on locations represented on the image). The New York Public Library’s Map Warper is a collection of some georeferenced maps. Now you can show movement on a map that’s actually from the time period you’re studying!

Same red lines across map as above, but image of map itself is a historical map

The same birthplace to arrival point data, but now with an older map!

Network Graphs

Perhaps the connections you want to see don’t make sense to be on a map, like those between people. This is where the Graph feature comes in. Graph allows you to create network visualizations based on different facets of your data. In general, network graphs display relationships between entities, and work best if all your nodes (dots) are the same type of information. They are especially useful to show connections between people, but our sample data doesn’t have that information. Instead, we can visualize our peoples’ occupation by gender.

network graph shows connections between peoples' occupations and their gender

Most occupations have both males and females, but only males are Monegasque, Author, Gambler, or Journalist, and only females are Aristocracy or Spouse.

The network graph makes it especially visible that there are some slight inconsistencies in the data; at least one person has “Aristocracy” as an occupation, while others have “Aristocrat.” Cleaning and standardizing your data is key! That sounds like a job for…OpenRefine!

Timelines

All of the tools in Palladio have the same Timeline functionality. This basically allows you to filter the data used in your visualization by a date, whether that’s birthdate, date of death, publication date, or whatever timey wimey stuff you have in your dataset. Other types of data can be filtered using the Facet function, right next to the Timeline. Play around with filtering, and watch your visualization change.

Try Palladio today! If you need more direction, check out this step-by-step tutorial by Miriam Posner. The tutorial is a few years old so the interface has changed slightly, so don’t panic if the buttons look different!

Did you create something cool in Palladio? Post a comment below, or tell us about it on Twitter!

 

Lightning Review: Text Analysis with R for Students of Literature

Cover of Text Analysis with R book

My undergraduate degree is in Classical Humanities and French, and like many humanities and liberal arts students, computers were mostly used for accessing Oxford Reference Online and double checking that “bonjour” meant “hello” before term papers were turned in. Actual critical analysis of literature came from my mind and my research, and nothing else. Recently, scholars in the humanities began seeing the potential of computational methods for their study, and coined these methods “digital humanities.” Computational text analysis provides insights that in many cases, aren’t possible for a human mind to complete. When was the last time you read 100 books to count occurrences of a certain word, or looked at thousands of documents to group their contents by topic? In Text Analysis with R for Students of Literature, Matthew Jockers presents programming concepts specifically how they relate to literature study, with plenty of help to make the most technophobic English student a digital humanist.

Jockers’ book caters to the beginning coder. You download practice text from his website that is already formatted to use in the tutorials presented, and he doesn’t dwell too much on pounding programming concepts into your head. I came into this text having already taken a course on Python, where we did edit text and complete exercises similar to the ones in this book, but even a complete beginner would find Jockers’ explanations perfect for diving into computational text analysis. There are some advanced statistical concepts presented which may turn those less mathematically inclined, but these are mentioned only as furthering understanding of what R does in the background, and can be left to the computer scientists. Practice-based and easy to get through, Text Analysis with R for Students of Literature serves its primary purpose of bringing the possibilities of programming to those used to traditional literature research methods.

Ready to start using a computer to study literature? Visit the Scholarly Commons to view the physical book, or download the eBook through the Illinois library.

Puentes/Bridges: Highlights from DH2018

At the end of June, the Alliance of Digital Humanities Organizations (ADHO) coordinated their annual international DH conference, Digital Humanities 2018, in Mexico City. DH2018 was the first conference in the organization’s history to be held in Latin America and in the global south. With a theme of Puentes/Bridges, DH2018 emphasized transnational discourse and inclusivity. Here are some highlights from the event!

Latin@ voices in the Midwest: Ohio Habla Podcast
Elena Foulis of Ohio State University discussed Ohio Habla, a podcast project that seeks to educate others on the Latin@ experience in the Midwest with interviews conducted in English and Spanish (and a mixture of the two).

Visualizing the Digital Humanities Community
What does the DH community look like? Researchers from University College London’s Centre for Digital Humanities visualized how authors of DH articles cite each other and interact with each other on Twitter, and compared the two networks.

Network Analysis of Javanese Traditional Theatre
How do characters in Javanese traditional theatre relate to one another? In an excellent example of non-traditional digital publishing, Miguel Escobar Varela of the National University of Singapore communicates his research findings on an interactive webpage.

Mayan hieroglyphs as a computer font

Mayan hieroglyphs as a computer font

Achieving Machine-Readable Mayan Text Via Unicode
Carlos Pallan Gayol of the University of Bonn and Deborah Anderson of UC Berkeley work to create Unicode equivalents of Mayan hieroglyphs to create a machine-readable version, ensuring reliable access to this language across devices.

Hurricane Memorial: Chronicling the Hurricane of 1928
A massive hurricane devastated Florida, Puerto Rico, and other parts of the Caribbean in 1928, but the story of this storm shifts depending on who you ask. Most of the storm’s victims were black migrant workers from Puerto Rico and Caribbean islands, whose deaths are minimized in most accounts. Christina Boyles of Trinity College seeks to “bring the stories of the storm’s underrepresented victims back into our cultural memory.”

Does “Late Style” Exist? New Stylometric Approaches to Variation in Single-Author Corpora
Jonathan Pearce Reeve presented some preliminary findings of his research on investigating whether or not an author has a true “late style.” Late style is a term most well-known from the works of Edward Said, alluding to an author’s shift to a writing style later in life that is unique from their “early” style. Read a review of his book, On Late Style. Code and other supplemental materials from Reeve’s research are available on GitHub.

screenshot from 4 rios webpage, shows drawings of people

4 Ríos: El Naya
A digital storytelling project about the impacts of armed conflict in Colombia, 4 Ríos is a transmedia project that includes a website, short film, and an interactive web-comic.

Researchers from our own University of Illinois participated in the conference, including Megan Senseney and Dan Tracy. Senseney, along with other Illinois researchers, presented “Audiences, Evidence, and Living Documents: Motivating Factors in Digital Humanities Monograph Publishing,” a survey of motivations behind humanities scholars digital publishing actions and needs. Megan also participated in a panel, “Unanticipated Afterlives: Resurrecting Dead Projects and Research Data for Pedagogical Use,” a discussion about how we might use unmaintained DH projects and data for learning purposes.

Tracy and other Illinois researchers presented a poster, Building a Bridge to Next Generation DH Services in Libraries with a Campus Needs Assessment, a report of results gathered while surveying the need for future DH services at research institutions, and how the library might facilitate this evolution. View Tracy’s poster in IDEALS.

ADHO gathered all resources tweeted out during the conference that you can view. You can also view a detailed schedule of presentations with descriptions here, or see paper abstracts here. Or, search #DH2018 on Twitter to see all the happenings!

Our Graduate Assistants: Kayla Abner

This interview is part of a new series introducing our graduate assistants to our online community. These are some of the people you will see when you visit our space, who will greet you with a smile and a willingness to help! Say hello to Kayla Abner!

What is your background education and work experience?

I have a Bachelor’s degree in Classical Humanities and French from Wright State University in Dayton (Go Raiders!). My original plan was to teach high school French or Latin, but after completing a student teaching practicum, I decided that wasn’t for me. During undergrad and after graduation, I always wound up in a job role that involved research or customer service in some capacity, which I really enjoyed.

What led you to your field?

Knowing that I enjoyed working on research, I considered going back to school for Library Science, but wanted to be sure before taking the jump. It was always interesting to see the results of the research I helped conduct, and I enjoyed helping people find answers, whether it was a coworker or a client. After a visit to an American Library Association conference in 2016,  I fell in love with the collaborative and share-alike nature of librarianship, and was accepted to this program the next year!

What are your research interests?

Library science has so many interesting topics, it’s hard to choose one. But, I like looking at how people seek information, and how libraries can use that knowledge to enhance their services and resources. I’m a browser when it comes to book shelves, and it’s interesting to see how libraries are succeeding/failing at bringing that experience to the digital realm.

What are your favorite projects you’ve worked on?

I have two positions here in the library, one in the Scholarly Commons, and one working with our (current) Digital Humanities librarian, Dan Tracy. In both roles, I’ve worn a lot of hats, so to speak. My favorites have been creating resources like library guides, and assisting with creating content for our Savvy Researcher workshop series. Maintaining our library guides requires some experience with the software, so I enjoy learning new cool things that our programs can do. I also do a lot of graphic design work, which is a lot of fun!

Completing some of these tasks let me use some Python knowledge from my coursework, which is sort of like a fun puzzle (how do I get this to work??). I’m really interested in using digital methods and tools in research, like text mining and data visualization. Coming from a humanities background, it is very exciting to see the cool things humanists can do beyond traditional scholarship. Digital humanities is a really interesting field that bridges the gap between computer science and the humanities.

What are some of your favorite underutilized resources that you would recommend?

Our people! They aren’t underutilized, but I love an opportunity to let campus know that we are an excellent point of contact between you and an expert. If you have a weird research question in one of our service areas, we can put in contact with the best person to help you.

When you graduate, what would your ideal job position look like?

I would love to work in an academic research library in a unit similar to the Scholarly Commons, where researchers can get the support they need to use digital methods and data in their research, especially in the humanities. There is a such a breadth of digital techniques that humanities researchers can utilize, that don’t necessarily replace traditional research methods. Distant reading a text puts forth different observations than traditional close reading, and both are equally useful.

What is the one thing you would want people to know about your field?

Librarians are happy to help you; don’t let a big desk intimidate you away from asking a question. That’s why we’re here!

New Digital Humanities Books in the Scholarly Commons!

Is there anything quite as satisfying as a new book? We just got a new shipment of books here in the Scholarly Commons that complement all our services, including digital humanities. Our books are non-circulating, so you cannot check them out, but these DH books are always available for your perusal in our space.

Stack of books in the Scholarly Commons

Two brand new and two mostly new DH books

Digital Humanities: Knowledge and Critique in a Digital Age by David M. Berry and Anders Fagerjord

Two media studies scholars examine the history and future of digital humanities. DH is a relatively new field, and one that is still not clearly defined. Berry and Fagerjord take a deep dive into the methods that digital humanists gravitate towards, and critique their use in relation to the broader cultural context. They are more critical of the “digital” than the “humanities,” meaning they consider more how use of digital tools affects the society as a whole (there’s that media studies!) than how scholars use digital methods in humanities work. They caution against using digital tools just because they are “better,” and instead encourage the reader to examine their role in the DH field to contribute to its ongoing growth. Berry has previously edited Understanding Digital Humanities (eBook available through Illinois library), which discusses similar issues. For a theoretical understanding of digital humanities, and to examine the issues in the field, read Digital Humanities.

Text Mining with R: A Tidy Approach by Julia Silge and David Robinson

Working with data can be messy, and text even messier. It never behaves how you expect it to, so approaching text analysis in a “tidy” manner is crucial. In Text Mining with R, Silge and Robinson present their tidytext framework for R, and instruct the reader in applying this package to natural language processing (NLP). NLP can be applied to derive meaning from unstructured text by way of unsupervised machine learning (wherein you train the computer to organize or otherwise analyze your text and then you go get coffee while it does all the work). This book is most helpful for those with programming experience, but no knowledge of text mining or natural language processing is required. With practical examples and easy to follow, step-by-step guides, Text Mining with R serves as an excellent introduction to tidying text for use in sentiment analysis, topic modeling, and classification.

No programming or R experience? Try some of our other books, like R Cookbook for an in-depth introduction, or Text Analysis with R for Students of Literature for a step-by-step learning experience focused on humanities people.

Visit us in the Scholarly Commons, 306 Main Library, to read some of our new books. Summer hours are Monday through Friday, 10 AM-5 PM. Hope to see you soon!