HathiTrust Research Center Expands Text Mining Corpus

Good news for text and data mining researchers! After years of court cases and policymaking, the entire 16-million-item collection of the HathiTrust Digital Library, including content in-copyright, is available for text and data mining. (Yay!)

Previously, only non-copyrighted, public domain materials were able to be used with HTRC Analytics’ suite of tools. The restriction obviously limited ability to do quality computational research on modern history; most out-of-copyright items are texts created before 1923. With this update, everyone can perform text analysis on the full corpus with different tools. HathiTrust is membership-based, so some restrictions apply to non-member institutions and independent scholars alike (Illinois is a member institution). With the passage of this new policy, only one service, the HTRC Data Capsule (a virtual computing environment), retains members-only access to the full corpus for requesters with an established research need. There are over 140 member institutions, including University of Illinois.

Here’s a quick overview of HTRC’s tools and access permissions (from HTRC’s Documentation).

  • HTRC Algorithms: a set of tools for assembling collections of digitized text from the HathiTrust corpus and performing text analysis on them. Including copyrighted items for ALL USERS.
  • Extracted Features Dataset: dataset allowing non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HathiTrust+Bookworm: a tool for visualizing and analyzing word usage trends in the HathiTrust corpus. Including copyrighted items for ALL USERS.
  • HTRC Data Capsule: a secure computing environment for researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items is available ONLY to member-affiliated researchers.

Fair Use to the Rescue!

How is this possible? Through both the Fair Use section of the Copyright Act and HathiTrust’s policy of allowing only non-consumptive research. Fair Use protects use of copyrighted materials for educational, research, and transformative purposes. Non-consumptive research means that researchers can glean information about works without actually being able to read (consume) them. You can see the end result (topic models, word and phrase statistics, etc.), without seeing the entirety of the work for human reading. Allowing computational research only on a corpus protects rights holders, and benefits researchers. A researcher can perform text analysis on thousands of texts without reading them all, which is the basis of computational text analysis anyway! Our Copyright Librarian, Sara Benson, recently discussed how Fair Use factors into HathiTrust’s definition of non-consumptive research.

Ready to use HTRC Analytics for text mining? Check out their Getting Started with HTRC Guide for some simple, guided start-up activities.

For general information about the digital library, see our guide on HathiTrust.

Analyze and Visualize Your Humanities Data with Palladio

How do you make sense of hundreds of years of handwritten scholarly correspondence? Humanists at Stanford University had the same question, and developed the project Mapping the Republic of Letters to answer it. The project maps scholarly social networks in a time when exchanging ideas meant waiting months for a letter to arrive from across the Atlantic, not mere seconds for a tweet to show up in your feed. The tools used in this project inspired the Humanities + Design lab at Stanford University to create a set of free tools specifically designed for historical data, which can be multi-dimensional and not suitable for analysis with statistical software. Enter Palladio!

To start mapping connections in Palladio, you first need some structured, tabular data. An Excel spreadsheet in CSV format with data that is categorized and sorted is sufficient. Once you have your data, just upload it and get analyzing. Palladio likes data about two types of things: people and places. The sample data Palladio provides is information about influential people who visited or were otherwise connected with the itty bitty country of Monaco. Read on for some cool things you can do with historical data.

Mapping

Use the Map feature to mark coordinates and connections between them. Using the sample data that HD Lab provided, I created the map below, which shows birthplaces and arrival points. Hovering over the connection shows you the direction of the move. By default, you can change the map itself to be standard maps like satellite or terrain, or even just land masses with no human-created geography, like roads or place names.

Map of Mediterranean sea and surrounding lands of Europe, red lines across map show movement, all end in Monaco

One person in our dataset was born in Galicia, and later arrived in Monaco.

But, what if you want to combine this new-fangled spatial analysis with something actually historic? You’re in luck! Palladio allows you to use other maps as bases, provided that the map has been georeferenced (assigned coordinates based on locations represented on the image). The New York Public Library’s Map Warper is a collection of some georeferenced maps. Now you can show movement on a map that’s actually from the time period you’re studying!

Same red lines across map as above, but image of map itself is a historical map

The same birthplace to arrival point data, but now with an older map!

Network Graphs

Perhaps the connections you want to see don’t make sense to be on a map, like those between people. This is where the Graph feature comes in. Graph allows you to create network visualizations based on different facets of your data. In general, network graphs display relationships between entities, and work best if all your nodes (dots) are the same type of information. They are especially useful to show connections between people, but our sample data doesn’t have that information. Instead, we can visualize our peoples’ occupation by gender.

network graph shows connections between peoples' occupations and their gender

Most occupations have both males and females, but only males are Monegasque, Author, Gambler, or Journalist, and only females are Aristocracy or Spouse.

The network graph makes it especially visible that there are some slight inconsistencies in the data; at least one person has “Aristocracy” as an occupation, while others have “Aristocrat.” Cleaning and standardizing your data is key! That sounds like a job for…OpenRefine!

Timelines

All of the tools in Palladio have the same Timeline functionality. This basically allows you to filter the data used in your visualization by a date, whether that’s birthdate, date of death, publication date, or whatever timey wimey stuff you have in your dataset. Other types of data can be filtered using the Facet function, right next to the Timeline. Play around with filtering, and watch your visualization change.

Try Palladio today! If you need more direction, check out this step-by-step tutorial by Miriam Posner. The tutorial is a few years old so the interface has changed slightly, so don’t panic if the buttons look different!

Did you create something cool in Palladio? Post a comment below, or tell us about it on Twitter!

 

Puentes/Bridges: Highlights from DH2018

At the end of June, the Alliance of Digital Humanities Organizations (ADHO) coordinated their annual international DH conference, Digital Humanities 2018, in Mexico City. DH2018 was the first conference in the organization’s history to be held in Latin America and in the global south. With a theme of Puentes/Bridges, DH2018 emphasized transnational discourse and inclusivity. Here are some highlights from the event!

Latin@ voices in the Midwest: Ohio Habla Podcast
Elena Foulis of Ohio State University discussed Ohio Habla, a podcast project that seeks to educate others on the Latin@ experience in the Midwest with interviews conducted in English and Spanish (and a mixture of the two).

Visualizing the Digital Humanities Community
What does the DH community look like? Researchers from University College London’s Centre for Digital Humanities visualized how authors of DH articles cite each other and interact with each other on Twitter, and compared the two networks.

Network Analysis of Javanese Traditional Theatre
How do characters in Javanese traditional theatre relate to one another? In an excellent example of non-traditional digital publishing, Miguel Escobar Varela of the National University of Singapore communicates his research findings on an interactive webpage.

Mayan hieroglyphs as a computer font

Mayan hieroglyphs as a computer font

Achieving Machine-Readable Mayan Text Via Unicode
Carlos Pallan Gayol of the University of Bonn and Deborah Anderson of UC Berkeley work to create Unicode equivalents of Mayan hieroglyphs to create a machine-readable version, ensuring reliable access to this language across devices.

Hurricane Memorial: Chronicling the Hurricane of 1928
A massive hurricane devastated Florida, Puerto Rico, and other parts of the Caribbean in 1928, but the story of this storm shifts depending on who you ask. Most of the storm’s victims were black migrant workers from Puerto Rico and Caribbean islands, whose deaths are minimized in most accounts. Christina Boyles of Trinity College seeks to “bring the stories of the storm’s underrepresented victims back into our cultural memory.”

Does “Late Style” Exist? New Stylometric Approaches to Variation in Single-Author Corpora
Jonathan Pearce Reeve presented some preliminary findings of his research on investigating whether or not an author has a true “late style.” Late style is a term most well-known from the works of Edward Said, alluding to an author’s shift to a writing style later in life that is unique from their “early” style. Read a review of his book, On Late Style. Code and other supplemental materials from Reeve’s research are available on GitHub.

screenshot from 4 rios webpage, shows drawings of people

4 Ríos: El Naya
A digital storytelling project about the impacts of armed conflict in Colombia, 4 Ríos is a transmedia project that includes a website, short film, and an interactive web-comic.

Researchers from our own University of Illinois participated in the conference, including Megan Senseney and Dan Tracy. Senseney, along with other Illinois researchers, presented “Audiences, Evidence, and Living Documents: Motivating Factors in Digital Humanities Monograph Publishing,” a survey of motivations behind humanities scholars digital publishing actions and needs. Megan also participated in a panel, “Unanticipated Afterlives: Resurrecting Dead Projects and Research Data for Pedagogical Use,” a discussion about how we might use unmaintained DH projects and data for learning purposes.

Tracy and other Illinois researchers presented a poster, Building a Bridge to Next Generation DH Services in Libraries with a Campus Needs Assessment, a report of results gathered while surveying the need for future DH services at research institutions, and how the library might facilitate this evolution. View Tracy’s poster in IDEALS.

ADHO gathered all resources tweeted out during the conference that you can view. You can also view a detailed schedule of presentations with descriptions here, or see paper abstracts here. Or, search #DH2018 on Twitter to see all the happenings!

Our Graduate Assistants: Kayla Abner

This interview is part of a new series introducing our graduate assistants to our online community. These are some of the people you will see when you visit our space, who will greet you with a smile and a willingness to help! Say hello to Kayla Abner!

What is your background education and work experience?

I have a Bachelor’s degree in Classical Humanities and French from Wright State University in Dayton (Go Raiders!). My original plan was to teach high school French or Latin, but after completing a student teaching practicum, I decided that wasn’t for me. During undergrad and after graduation, I always wound up in a job role that involved research or customer service in some capacity, which I really enjoyed.

What led you to your field?

Knowing that I enjoyed working on research, I considered going back to school for Library Science, but wanted to be sure before taking the jump. It was always interesting to see the results of the research I helped conduct, and I enjoyed helping people find answers, whether it was a coworker or a client. After a visit to an American Library Association conference in 2016,  I fell in love with the collaborative and share-alike nature of librarianship, and was accepted to this program the next year!

What are your research interests?

Library science has so many interesting topics, it’s hard to choose one. But, I like looking at how people seek information, and how libraries can use that knowledge to enhance their services and resources. I’m a browser when it comes to book shelves, and it’s interesting to see how libraries are succeeding/failing at bringing that experience to the digital realm.

What are your favorite projects you’ve worked on?

I have two positions here in the library, one in the Scholarly Commons, and one working with our (current) Digital Humanities librarian, Dan Tracy. In both roles, I’ve worn a lot of hats, so to speak. My favorites have been creating resources like library guides, and assisting with creating content for our Savvy Researcher workshop series. Maintaining our library guides requires some experience with the software, so I enjoy learning new cool things that our programs can do. I also do a lot of graphic design work, which is a lot of fun!

Completing some of these tasks let me use some Python knowledge from my coursework, which is sort of like a fun puzzle (how do I get this to work??). I’m really interested in using digital methods and tools in research, like text mining and data visualization. Coming from a humanities background, it is very exciting to see the cool things humanists can do beyond traditional scholarship. Digital humanities is a really interesting field that bridges the gap between computer science and the humanities.

What are some of your favorite underutilized resources that you would recommend?

Our people! They aren’t underutilized, but I love an opportunity to let campus know that we are an excellent point of contact between you and an expert. If you have a weird research question in one of our service areas, we can put in contact with the best person to help you.

When you graduate, what would your ideal job position look like?

I would love to work in an academic research library in a unit similar to the Scholarly Commons, where researchers can get the support they need to use digital methods and data in their research, especially in the humanities. There is a such a breadth of digital techniques that humanities researchers can utilize, that don’t necessarily replace traditional research methods. Distant reading a text puts forth different observations than traditional close reading, and both are equally useful.

What is the one thing you would want people to know about your field?

Librarians are happy to help you; don’t let a big desk intimidate you away from asking a question. That’s why we’re here!

Project Forum: Meeting 1

A logo for the Scholarly Commons Project Forum.

On Monday, March 5, the Scholarly Commons Interns (Matt and Clay) hosted the first Project Forum Discussion. In order to address the variety of projects and scholarly backgrounds, we decided that our conversations should be organized around presentations of projects and related readings from other Digital Humanities scholars or related research.

We began by discussing some consistent topics or questions that are present in each of our Digital Humanities projects and how we conceptualize them. These questions will not only guide our reading discussion on this article, but also further conversations as we read work under the DH umbrella.

1. How does the article make its DH work legible to other scholars / fields?
2. How does the article display information?
3. What affordances or impact does the digital platform (artifact) have on the study?
4. How does the article conceptualize gaps in the data?

If you would like to participate in our next discussion, please join us Monday, March 26, at 2 pm in Library 220.

Meet Dan Tracy, Information Sciences and Digital Humanities Librarian

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Dan Tracy, Information Sciences and Digital Humanities Librarian.


What is your background and work experience?

I originally come from a humanities background and completed a PhD in literature specializing in 20th century American literature, followed by teaching as a lecturer for two years. I had worked a lot with librarians during that time with my research and teaching. When you’re a PhD student in English, you teach a lot of rhetoric, and I also taught some literature classes. As a rhetoric instructor I worked closely with the Undergraduate Library’s instruction services, which exposed me to the work librarians do with instruction.

Then I did a Master’s in Library and Information Science here, knowing that I was interested in being an academic librarian, probably something in the area of being a subject librarian in the humanities. And then I began this job about five years ago. So I’ve been here about five years now in this role. And just began doing Digital Humanities over the summer. I had previously done some liaison work related to digital humanities, especially related to digital publishing, and I had been doing some research related to user experience and digital publishing as related to DH publishing tools.

What led you to this field?

A number of things. One was having known quite a number of people who went into librarianship who really liked it and talked about their work. Another was my experience working with librarians in terms of their instruction capacity. I was interested in working in an academic environment and I was interested in academic librarianship and teaching. And also, especially as things evolved, after I went back for the degree in library and information science, I also found a lot of other things to be interested in as well, including things like digital humanities and data issues.

What is your research agenda?

My research looks at user experience in digital publishing. Primarily in the context of both ebook formats and newer experimental forms of publication such as web and multi-modal publishing with tools like Scalar, especially from the reader side, but also from the creator side of these platforms.

Do you have any favorite work-related duties?

As I mentioned before, instruction was an initial draw to librarianship. I like anytime I can teach and work with students, or faculty for that matter, and help them learn new things. That would probably be a top thing. And I think increasingly the chances I get to work with digital collections issues as well. I think there’s a lot of exciting work to do there in terms of delivering our digital collections to scholars to complete both traditional and new forms of research projects.

What are some of your favorite underutilized resources that you would recommend to researchers?

I think there’s a lot. I think researchers are already aware of digital primary sources in general, but I do think there’s a lot more for people to explore in terms of collections we’ve digitized and things we can do with those through our digital library, and through other digital library platforms, like DPLA (Digital Public Library of America).

I think that a lot of our digital image collections are especially underutilized. I think people are more aware that we have digitized text sources, but not aware of our digitized primary sources that are images that have value of research objects, including analyzed computational analysis. We also have more and more access to the text data behind our various vendor platforms, which is a resource various researchers on campus increasingly need but don’t always know is available.

If you could recommend one book to beginning researchers in your field, what would you recommend?

If you’re just getting started, I think a good place to look is at the Debates in the Digital Humanities books, which are collections of essays that touch on a variety of critical issues in digital humanities research and teaching. This is a good place to start if you want to get a taste of the ongoing debates and issues. There are open access copies of them available online, so they are easy to get to.

Dan Tracy can be reached at dtracy@illinois.edu.

Preparing Your Data for Topic Modeling

In keeping with my series of blog posts on my research project, this post is about how to prepare your data for input into a topic modeling package. I used Twitter data in my project, which is relatively sparse at only 140 characters per tweet, but the principles can be applied to any document or set of documents that you want to analyze.

Topic Models:

Topic models work by identifying and grouping words that co-occur into “topics.” As David Blei writes, Latent Dirichlet allocation (LDA) topic modeling makes two fundamental assumptions: “(1) There are a fixed number of patterns of word use, groups of terms that tend to occur together in documents. Call them topics. (2) Each document in the corpus exhibits the topics to varying degree. For example, suppose two of the topics are politics and film. LDA will represent a book like James E. Combs and Sara T. Combs’ Film Propaganda and American Politics: An Analysis and Filmography as partly about politics and partly about film.”

Topic models do not have any actual semantic knowledge of the words, and so do not “read” the sentence. Instead, topic models use math. The tokens/words that tend to co-occur are statistically likely to be related to one another. However, that also means that the model is susceptible to “noise,” or falsely identifying patterns of cooccurrence if non-important but highly-repeated terms are used. As with most computational methods, “garbage in, garbage out.”

In order to make sure that the topic model is identifying interesting or important patterns instead of noise, I had to accomplish the following pre-processing or “cleaning” steps.

  • First, I removed the punctuation marks, like “,.;:?!”. Without this step, commas started showing up in all of my results. Since they didn’t add to the meaning of the text, they were not necessary to analyze.
  • Second, I removed the stop-words, like “I,” “and,” and “the,” because those words are so common in any English sentence that they tend to be over-represented in the results. Many of my tweets were emotional responses, so many authors wrote in the first person. This tended to skew my results, although you should be careful about what stop words you remove. Simply removing stop-words without checking them first means that you can accidentally filter out important data.
  • Finally, I removed too common words that were uniquely present in my data. For example, many of my tweets were retweets and therefore contained the word “rt.” I also ended up removing mentions to other authors because highly retweeted texts tended to mean that I was getting Twitter user handles as significant words in my results.

Cleaning the Data:

My original data set was 10 Excel files of 10,000 tweets each. In order to clean and standardize all these data points, as well as combining my file into one single document, I used OpenRefine. OpenRefine is a powerful tool, and it makes it easy to work with all your data at once, even if it is a large number of entries. I uploaded all of my datasets, then performed some quick cleaning available under the “Common Transformations” option under the triangle dropdown at the head of each column: I changed everything to lowercase, unescaped HTML characters (to make sure that I didn’t get errors when trying to run it in Python), and removed extra white spaces between words.

OpenRefine also lets you use regular expressions, which is a kind of search tool for finding specific strings of characters inside other text. This allowed me to remove punctuation, hashtags, and author mentions by running a find and replace command.

  • Remove punctuation: grel:value.replace(/(\p{P}(?<!’)(?<!-))/, “”)
    • Any punctuation character is removed.
  • Remove users: grel:value.replace(/(@\S*)/, “”)
    • Any string that begins with an @ is removed. It ends at the space following the word.
  • Remove hashtags: grel:value.replace(/(#\S*)/,””)
    • Any string that begins with a # is removed. It ends at the space following the word.

Regular expressions, commonly abbreviated as “regex,” can take a little getting used to in order to understand how they work. Fortunately, OpenRefine itself has some solid documentation on the subject, and I also found this cheatsheet valuable as I was trying to get it work. If you want to create your own regex search strings, regex101.com has a tool that lets you test your expression before you actually deploy it in OpenRefine.

After downloading the entire data set as a Comma Separated Value (.csv) file, I then used the Natural Language ToolKit (NLTK) for Python to remove stop-words. The code itself can be found here, but I first saved the content of the tweets as a single text file, and then I told NLTK to go over every line of the document and remove words that are in its common stop word dictionary. The output is then saved in another text file, which is ready to be fed into a topic modeling package, such as MALLET.

At the end of all these cleaning steps, my resulting data is essentially composed of unique nouns and verbs, so, for example, @Phoenix_Rises13’s tweet “rt @drlawyercop since sensible, national gun control is a steep climb, how about we just start with orlando? #guncontrolnow” becomes instead “since sensible national gun control steep climb start orlando.” This means that the topic modeling will be more focused on the particular words present in each tweet, rather than commonalities of the English language.

Now my data is cleaned from any additional noise, and it is ready to be input into a topic modeling program.

Interested in working with topic models? There are two Savvy Researcher topic modeling workshops, on December 6 and December 8, that focus on the theory and practice of using topic models to answer questions in the humanities. I hope to see you there!

An Introduction to Traditional Knowledge Labels and Licenses

NOTE: While we are discussing matters relating to the law, this post is not meant as legal advice.

Overview

Fans of Mukurtu CMS, a digital archeology platform, as well as intellectual property nerds may already be familiar with Traditional Knowledge labels and licenses, but for everyone else here’s a quick introduction. Traditional Knowledge labels and licenses, were specifically created for researchers and artists working with or thinking of digitizing materials created by indigenous groups. Although created more educational, rather than legal value, these labels aim to allow indigenous groups to take back some control over their cultural heritage and to educate users about how to incorporate these digital heritage items in a more just and culturally sensitive way. The content that TK licenses and labels cover extends beyond digitized visual arts and design to recorded and written and oral histories and stories. TK licenses and labels are also a standard to consider when working with any cultural heritage created by marginalized communities. They also provide an interesting way to recognize ownership and the proper use of work that is in the public domain. These labels and licenses are administered by Local Contexts, an organization directed by Jane Anderson, a professor at New York University and Kim Christen, a professor at Washington State University. Local Contexts is dedicated to helping Native Americans and other indigenous groups gain recognition for, and control over, the way their intellectual property is used. This organization has received funding from sources including the National Endowment for Humanities, and the World Intellectual Property Organization.

Traditional knowledge, or TK, labels and licenses are a way to incorporate protocols for cultural practices into your humanities data management and presentation strategies. This is especially relevant because indigenous cultural heritage items are traditionally viewed by Western intellectual property laws as part of the public domain. And, of course, there is a long and troubling history of dehumanizing treatment of Native Americans by American institutions, as well as a lack of formal recognition of their cultural practices, which is only starting to be addressed. Things have been slowly improving; for example, the Native American Graves and Repatriation Act of 1990 was a law specifically created to address institutions, such as museums, which owned and displayed people’s relative’s remains and related funerary art without their permission or the permission of their surviving relatives (McManamon, 2000). The World Intellectual Property Organization’s Intergovernmental Committee on Intellectual Property and Genetic Resources, Traditional Knowledge and Folklore (IGC) has began to address and open up conversations about these issues in hopes of coming up with a more consistent legal framework for countries to work with; though, confusingly, most of what Traditional Knowledge labels and licenses apply to are considered “Traditional Cultural Expressions” by WIPO (“Frequently Asked Questions,” n.d.).

To see these labels and licenses in action, take a look at how how these are used is the Mira Canning Stock Route Project Archive from Australia (“Mira Canning Stock Route Project Archive,” n.d.).

The main difference between TK labels and licenses is that TK labels are an educational tool for suggested use with indigenous materials, whether or not they are legally owned by an indigenous community, while TK licenses are similar to Creative Commons licenses — though less recognized — and serve as a customizable supplement to traditional copyright law for materials owned by indigenous communities (“Does labeling change anything legally?,” n.d.).

The default types of TK licenses are: TK Education, TK Commercial, TK Attribution, TK Noncommercial.

Four proposed TK licenses

TK Licenses so far (“TK Licenses,” n.d.)

Each license and label, as well as a detailed description can be found on the Local Contexts site and information about each label is available in English, French, and Spanish.

The types of TK labels are: TK Family, TK Seasonal, TK Outreach, TK Verified, TK Attribution, TK Community Use Only, TK Secret/Sacred, TK Women General, TK Women Restricted, TK Men General, TK Men Restricted, TK Noncommercial, TK Commercial, TK Community Voice, TK Culturally Sensitive (“Traditional Knowledge (TK) Labels,” n.d.).

Example:

TK Women Restricted (TK WR) Label

A TK Women Restricted Label.

“This material has specific gender restrictions on access. It is regarded as important secret and/or ceremonial material that has community-based laws in relation to who can access it. Given its nature it is only to be accessed and used by authorized [and initiated] women in the community. If you are an external third party user and you have accessed this material, you are requested to not download, copy, remix or otherwise circulate this material to others. This material is not freely available within the community and it therefore should not be considered freely available outside the community. This label asks you to think about whether you should be using this material and to respect different cultural values and expectations about circulation and use.” (“TK Women Restricted (TK WR),” n.d.)

Wait, so is this a case where a publicly-funded institution is allowed to restrict content from certain users by gender and other protected categories?

The short answer is that this is not what these labels and licenses are used for. Local Contexts, Mukurtu, and many of the projects and universities associated with the Traditional Knowledge labels and licensing movement are publicly funded. From what I’ve seen, the restrictions are optional, especially for those outside the community (“Does labeling change anything legally?,” n.d.). It’s more a way to point out when something is meant only for members of a certain gender, or to be viewed during a time of year, than to actually restrict something only to members of a certain gender. In other words, the gender-based labels for example are meant for the type of self-censorship of viewing materials that is often found in archival spaces. That being said, some universities have what is called a Memorandum of Understanding between a university and an indigenous community, which involve universities agreeing to respect the Native American culture. The extent to which this goes for digitized cultural heritage held in university archives, for example, is unclear, though most Memorandum of Understanding are not legally binding (“What is a Memorandum of Understanding or Memorandum of Agreement?,” n.d.) . Overall, this raises lots of interesting questions about balancing conflicting views of intellectual property and access and public domain.

Works Cited:

Does labeling change anything legally? (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/project/does-labeling-change-anything-legally/
Frequently Asked Questions. (n.d.). Retrieved August 3, 2017, from http://www.wipo.int/tk/en/resources/faqs.html
McManamon, F. P. (2000). NPS Archeology Program: The Native American Graves Protection and Repatriation Act (NAGPRA). In L. Ellis (Ed.), Archaeological Method and Theory: An Encyclopedia. New York and London: Garland Publishing Co. Retrieved from https://www.nps.gov/archeology/tools/laws/nagpra.htm
Mira Canning Stock Route Project Archive. (n.d.). Retrieved August 3, 2017, from http://mira.canningstockrouteproject.com/
TK Licenses. (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/tk-licenses/
TK Women Restricted (TK WR). (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/tk/wr/1.0
What is a Memorandum of Understanding or Memorandum of Agreement? (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/project/what-is-a-memorandum-of-understandingagreement/

Further Reading:

Christen, K., Merrill, A., & Wynne, M. (2017). A Community of Relations: Mukurtu Hubs and Spokes. D-Lib Magazine, 23(5/6). https://doi.org/10.1045/may2017-christen
Educational Resources. (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/educational-resources/
Lord, P. (n.d.). Unrepatriatable: Native American Intellectual Property and Museum Digital Publication. Retrieved from http://www.academia.edu/7770593/Unrepatriatable_Native_American_Intellectual_Property_and_Museum_Digital_Publication
Project Description. (n.d.). Retrieved August 3, 2017, from http://www.sfu.ca/ipinch/about/project-description/

Acknowledgements:

Thank you to the Rare Book and Manuscript Library and Melissa Salrin in the iSchool for helping me with my questions about indigenous and religious materials in archives and special collections at public institutions, you are the best!

Finding Digital Humanities Tools in 2017

Here at the Scholarly Commons we want to make sure our patrons know what options are out there for conducting and presenting their research. The digital humanities are becoming increasingly accepted and expected. In fact, you can even play an online game about creating a digital humanities center at a university. After a year of exploring a variety of digital humanities tools, one theme has emerged throughout: taking advantage of the capabilities of new technology to truly revolutionize scholarly communications is actually a really hard thing to do.  Please don’t lose sight of this.

Finding digital humanities tools can be quite challenging. To start, many of your options will be open source tools that you need a server and IT skills to run ($500+ per machine or a cloud with slightly less or comparable cost on the long term). Even when they aren’t expensive be prepared to find yourself in the command line or having to write code, even when a tool is advertised as beginner-friendly.

Mukurtu Help Page Screen Shot

I think this has been taken down because even they aren’t kidding themselves anymore.

There is also the issue of maintenance. While free and open source projects are where young computer nerds go to make a name for themselves, not every project is going to have the paid staff or organized and dedicated community to keep the project maintained over the years. What’s more, many digital humanities tool-building projects are often initiatives from humanists who don’t know what’s possible or what they are doing, with wildly vacillating amounts of grant money available at any given time. This is exacerbated by rapid technological changes, or the fact that many projects were created without sustainability or digital preservation in mind from the get-go. And finally, for digital humanists, failure is not considered a rite of passage to the extent it is in Silicon Valley, which is part of why sometimes you find projects that no longer work still listed as viable resources.

Finding Digital Humanities Tools Part 1: DiRT and TAPoR

Yes, we have talked about DiRT here on Commons Knowledge. Although the Digital Research Tools directory is an extensive resource full of useful reviews, over time it has increasingly become a graveyard of failed digital humanities projects (and sometimes randomly switches to Spanish). DiRT directory itself  comes from Project Bamboo, “… a  humanities cyber- infrastructure  initiative  funded  by  the  Andrew  W.  Mellon Foundation between 2008 and 2012, in order to enhance arts and humanities research through the development of infrastructure and support for shared technology services” (Dombrowski, 2014).  If you are confused about what that means, it’s okay, a lot of people were too, which led to many problems.

TAPoR 3, Text Analysis Portal for Research is DiRT’s Canadian counterpart, which also contains reviews of a variety of digital humanities tools, despite keeping text analysis in the name. Like DiRT, outdated sources are listed.

Part 2: Data Journalism, digital versions of your favorite disciplines, digital pedagogy, and other related fields.

A lot of data journalism tools crossover with digital humanities; in fact, there are even joint Digital Humanities and Data Journalism conferences! You may have even noticed how The Knight Foundation is to data journalism what the Mellon Foundation is to digital humanities. However, Journalism Tools and the list version on Medium from the Tow-Knight Center for Entrepreneurial Journalism at CUNY Graduate School of Journalism and the Resources page from Data Driven Journalism, an initiative from the European Journalism Centre and partially funded by the Dutch government, are both good places to look for resources. As with DiRT and TAPoR, there are similar issues with staying up-to-date. Also data journalism resources tend to list more proprietary tools.

Also, be sure to check out resources for “digital” + [insert humanities/social science discipline], such as digital archeology and digital history.  And of course, another subset of digital humanities is digital pedagogy, which focuses on using technology to augment educational experiences of both  K-12 and university students. A lot of tools and techniques developed for digital pedagogy can also be used outside the classroom for research and presentation purposes. However, even digital science resources can have a lot of useful tools if you are willing to scroll past an occasional plasmid sharing platform. Just remember to be creative and try to think of other disciplines tackling similar issues to what you are trying to do in their research!

Part 3: There is a lot of out-of-date advice out there.

There are librarians who write overviews of digital humanities tools and don’t bother test to see if they still work or are still updated. I am very aware of how hard things are to use and how quickly things change, and I’m not at all talking about the people who couldn’t keep their websites and curated lists updated. Rather, I’m talking about, how the “Top Tools for Digital Humanities Research” in the January/February 2017  issue of “Computers in Libraries” mentions Sophie, an interactive eBook creator  (Herther, 2017). However, Sophie has not updated since 2011 and the link for the fully open source version goes to “Watch King Kong 2 for Free”.

Screenshot of announcement for 2010 Sophie workshop at Scholarly Commons

Looks like we all missed the Scholarly Commons Sophie workshop by only 7 years.

The fact that no one caught that error either shows either how slowly magazines edit, or that no one else bothered check. If no one seems to have created any projects with the software in the past three years it’s probably best to assume it’s no longer happening; though, the best route is to always check for yourself.

Long term solutions:

Save your work in other formats for long term storage. Take your data management and digital preservation seriously. We have resources that can help you find the best options for saving your research.

If you are serious about digital humanities you should really consider learning to code. We have a lot of resources for teaching yourself these skills here at the Scholarly Commons, as well as a wide range of workshops during the school year. As far as coding languages, HTML/CSS, Javascript, Python are probably the most widely-used tools in the digital humanities, and the most helpful. Depending on how much time you put into this, learning to code can help you troubleshoot and customize your tools, as well as allow you contribute to and help maintain the open source projects that you care about.

Works Cited:

100 tools for investigative journalists. (2016). Retrieved May 18, 2017, from https://medium.com/@Journalism2ls/75-tools-for-investigative-journalists-7df8b151db35

Center for Digital Scholarship Portal Mukurtu CMS.  (2017). Support. Retrieved May 11, 2017 from http://support.mukurtu.org/?b_id=633

DiRT Directory. (2015). Retrieved May 18, 2017 from http://dirtdirectory.org/

Digital tools for researchers. (2012, November 18). Retrieved May 31, 2017, from http://connectedresearchers.com/online-tools-for-researchers/

Dombrowski, Q. (2014). What Ever Happened to Project Bamboo? Literary and Linguistic Computing. https://doi.org/10.1093/llc/fqu026

Herther, N.K. (2017). Top Tools for Digital Humanities Research. Retrieved May 18, 2017, from http://www.infotoday.com/cilmag/jan17/Herther–Top-Tools-for-Digital-Humanities-Research.shtml

Journalism Tools. (2016). Retrieved May 18, 2017 from http://journalismtools.io/

Lord, G., Nieves, A.D., and Simons, J. (2015). dhQuest. http://dhquest.com/

Resources Data Driven Journalism. (2017). Retrieved May 18, 2017, from http://datadrivenjournalism.net/resources
TAPoR 3. (2015). Retrieved May 18, 2017 from http://tapor.ca/home

Visel, D. (2010). Upcoming Sophie Workshops. Retrieved May 18, 2017, from http://sophie2.org/trac/blog/upcomingsophieworkshops

Neatline 101: Getting Started

Here at Commons Knowledge we love easy-to-use interactive map creation software! We’ve compared and contrasted different tools, and talked about StoryMap JS and Shanti Interactive. The Scholarly Commons is a great place to get help on GIS projects, from ArcGIS StoryMaps and beyond. But if you want something where you can have both a map and a timeline, and if you are willing to spend money on your own server, definitely consider using Neatline.

Neatline is a plugin created by Scholar’s Lab at University of Virginia that lets you create interactive maps and timelines in Omeka exhibits. My personal favorite example is the demo site by Paul Mawyer “‘I am it and it is I’: Lovecraft in Providence” with the map tiles from Stamen Design under CC-BY 3.0 license.

Screenshot of Lovecraft Neatline exhibit

*As far as the location of Lovecraft’s most famous creation, let’s just say “Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn.”

Now one caveat — Neatline requires a server. I used Reclaim Hosting which is straightforward, and which I have used for Scalar and Mukurtu. The cheapest plan available on Reclaim Hosting was $32 a year. Once I signed up for the website and domain name, I took advantage of one nice feature of Reclaim Hosting, which lets you one-click install the Omeka.org content management system (CMS). The Omeka CMS is a popular choice for digital humanities users. Other popular content management systems include Wordpress and Scalar.

One click install of Omeka through Reclaim Hosting

BUT WAIT, WHAT ABOUT OMEKA THROUGH SCHOLARLY COMMONS?

Here at the Scholarly Commons we can set up an Omeka.net site for you. You can find more information on setting up an Omeka.net site through the Scholarly Commons here. This is a great option for people who want to create a regular Omeka exhibit. However, Neatline is only available as a plugin on Omeka.org, which needs a server to host. As far as I know, there is currently no Neatline plugin for Omeka.net and I don’t think that will be happening anytime soon. On Reclaim you can install Omeka on any LAMP server. And side advice from your very forgetful blogger, write down whatever username and password you make up when you set up your Omeka site, that will save you a lot of trouble later, especially considering how many accounts you end up with when you use a server to host a site.

Okay, I’m still interested, but what do I do once I have Omeka.org installed? 

So back to the demo. I used the instructions on the documentation page on Neatline, which were good for defining a lot of the terms but not so good at explaining exactly what to do. I am focusing on the original Neatline plugin but there are other Neatline plugins like NeatlineText depending on your needs. However all plugins are installed in a similar way. You can follow the official instructions here at Installing Neatline.

But I have also provided some because the official instructions just didn’t do it for me.

So first off, download the Neatline zip file.

Go to your Control Panel, cPanel in Reclaim Hosting, and click on “File Manager.”

File Manager circled on Reclaim Hosting

Sorry this looks so goofy, Windows snipping tool free form is only for those with a steady hand.

Navigate to the the Plugins folder.

arrow points at plugins folder in file manager

Double click to open the folder. Click Upload Files.

more arrows pointing at tiny upload option in Plugins folder

If you’re using Reclaim Hosting, IGNORE THE INSTRUCTIONS DO NOT UNZIP THE ZIP FILE ON YOUR COMPUTER JUST PLOP THAT PUPPY RIGHT INTO YOUR PLUGINS FOLDER.

Upload the entire zip file

                      Plop it in!

Go back to the Plugins folder. Right click the Neatline zip file and click extract. Save extracted files in Plugins.

Extract Neatline files in File Manager

Sign into your Omeka site at [yourdomainname].[com/name/whatever]/admin if you aren’t already.

Omeka dashboard with arrows pointing at Plugins

Install Neatline for real.

Omeka Plugins page

Still confused or having trouble with setup?

Check out these tutorials as well!

Open Street Maps is great and all but what if I want to create a fancy historical map?

To create historical maps on Neatline you have two options, only one of which is included in the actual documentation for Neatline.

Officially, you are supposed to use GeoServer. GeoServer is an open source server application built in Java. Even if you have your own server, it has a lot more dependencies to run than what’s required for Omeka / Neatline.

If you want one-click Neatline installation with GeoServer and have money to spend you might want to check out AcuGIS Neatline Cloud Hosting which is recommended in the Neatline documentation and the lowest cost plan starts at $250 a year.

Unofficially, there is a tutorial for this available at Lincoln Mullen’s blog “The Backward Glance” specifically his 2015 post “How to Use Neatline with Map Warper Instead of Geoserver.”

Let us know about the ways you incorporate geospatial data in your research!  And stay tuned for Neatline 102: Creating a simple exhibit!

Works Cited:

Extending Omeka with Plugins. (2016, July 5). Retrieved May 23, 2017, from http://history2016.doingdh.org/week-1-wednesday/extending-omeka-with-plugins/

Installing Neatline Neatline Documentation. (n.d.). Retrieved May 23, 2017, from http://docs.neatline.org/installing-neatline.html

Mawyer, Paul. (n.d.). “I am it and it is I”: Lovecraft in Providence. Retrieved May 23, 2017, from http://lovecraft.neatline.org/neatline-exhibits/show/lovecraft-in-providence/fullscreen

Mullen, Lincoln. (2015).  “How to Use Neatline with Map Warper Instead of Geoserver.” Retrieved May 23, 2017 from http://lincolnmullen.com/blog/how-to-use-neatline-with-map-warper-instead-of-geoserver/

Uploading Plugins to Omeka. (n.d.). Retrieved May 23, 2017, from https://community.reclaimhosting.com/t/uploading-plugins-to-omeka/195

Working with Omeka. (n.d.). Retrieved May 23, 2017, from https://community.reclaimhosting.com/t/working-with-omeka/194