Meet Dan Tracy, Information Sciences and Digital Humanities Librarian

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Dan Tracy, Information Sciences and Digital Humanities Librarian.


What is your background and work experience?

I originally come from a humanities background and completed a PhD in literature specializing in 20th century American literature, followed by teaching as a lecturer for two years. I had worked a lot with librarians during that time with my research and teaching. When you’re a PhD student in English, you teach a lot of rhetoric, and I also taught some literature classes. As a rhetoric instructor I worked closely with the Undergraduate Library’s instruction services, which exposed me to the work librarians do with instruction.

Then I did a Master’s in Library and Information Science here, knowing that I was interested in being an academic librarian, probably something in the area of being a subject librarian in the humanities. And then I began this job about five years ago. So I’ve been here about five years now in this role. And just began doing Digital Humanities over the summer. I had previously done some liaison work related to digital humanities, especially related to digital publishing, and I had been doing some research related to user experience and digital publishing as related to DH publishing tools.

What led you to this field?

A number of things. One was having known quite a number of people who went into librarianship who really liked it and talked about their work. Another was my experience working with librarians in terms of their instruction capacity. I was interested in working in an academic environment and I was interested in academic librarianship and teaching. And also, especially as things evolved, after I went back for the degree in library and information science, I also found a lot of other things to be interested in as well, including things like digital humanities and data issues.

What is your research agenda?

My research looks at user experience in digital publishing. Primarily in the context of both ebook formats and newer experimental forms of publication such as web and multi-modal publishing with tools like Scalar, especially from the reader side, but also from the creator side of these platforms.

Do you have any favorite work-related duties?

As I mentioned before, instruction was an initial draw to librarianship. I like anytime I can teach and work with students, or faculty for that matter, and help them learn new things. That would probably be a top thing. And I think increasingly the chances I get to work with digital collections issues as well. I think there’s a lot of exciting work to do there in terms of delivering our digital collections to scholars to complete both traditional and new forms of research projects.

What are some of your favorite underutilized resources that you would recommend to researchers?

I think there’s a lot. I think researchers are already aware of digital primary sources in general, but I do think there’s a lot more for people to explore in terms of collections we’ve digitized and things we can do with those through our digital library, and through other digital library platforms, like DPLA (Digital Public Library of America).

I think that a lot of our digital image collections are especially underutilized. I think people are more aware that we have digitized text sources, but not aware of our digitized primary sources that are images that have value of research objects, including analyzed computational analysis. We also have more and more access to the text data behind our various vendor platforms, which is a resource various researchers on campus increasingly need but don’t always know is available.

If you could recommend one book to beginning researchers in your field, what would you recommend?

If you’re just getting started, I think a good place to look is at the Debates in the Digital Humanities books, which are collections of essays that touch on a variety of critical issues in digital humanities research and teaching. This is a good place to start if you want to get a taste of the ongoing debates and issues. There are open access copies of them available online, so they are easy to get to.

Dan Tracy can be reached at dtracy@illinois.edu.

Preparing Your Data for Topic Modeling

In keeping with my series of blog posts on my research project, this post is about how to prepare your data for input into a topic modeling package. I used Twitter data in my project, which is relatively sparse at only 140 characters per tweet, but the principles can be applied to any document or set of documents that you want to analyze.

Topic Models:

Topic models work by identifying and grouping words that co-occur into “topics.” As David Blei writes, Latent Dirichlet allocation (LDA) topic modeling makes two fundamental assumptions: “(1) There are a fixed number of patterns of word use, groups of terms that tend to occur together in documents. Call them topics. (2) Each document in the corpus exhibits the topics to varying degree. For example, suppose two of the topics are politics and film. LDA will represent a book like James E. Combs and Sara T. Combs’ Film Propaganda and American Politics: An Analysis and Filmography as partly about politics and partly about film.”

Topic models do not have any actual semantic knowledge of the words, and so do not “read” the sentence. Instead, topic models use math. The tokens/words that tend to co-occur are statistically likely to be related to one another. However, that also means that the model is susceptible to “noise,” or falsely identifying patterns of cooccurrence if non-important but highly-repeated terms are used. As with most computational methods, “garbage in, garbage out.”

In order to make sure that the topic model is identifying interesting or important patterns instead of noise, I had to accomplish the following pre-processing or “cleaning” steps.

  • First, I removed the punctuation marks, like “,.;:?!”. Without this step, commas started showing up in all of my results. Since they didn’t add to the meaning of the text, they were not necessary to analyze.
  • Second, I removed the stop-words, like “I,” “and,” and “the,” because those words are so common in any English sentence that they tend to be over-represented in the results. Many of my tweets were emotional responses, so many authors wrote in the first person. This tended to skew my results, although you should be careful about what stop words you remove. Simply removing stop-words without checking them first means that you can accidentally filter out important data.
  • Finally, I removed too common words that were uniquely present in my data. For example, many of my tweets were retweets and therefore contained the word “rt.” I also ended up removing mentions to other authors because highly retweeted texts tended to mean that I was getting Twitter user handles as significant words in my results.

Cleaning the Data:

My original data set was 10 Excel files of 10,000 tweets each. In order to clean and standardize all these data points, as well as combining my file into one single document, I used OpenRefine. OpenRefine is a powerful tool, and it makes it easy to work with all your data at once, even if it is a large number of entries. I uploaded all of my datasets, then performed some quick cleaning available under the “Common Transformations” option under the triangle dropdown at the head of each column: I changed everything to lowercase, unescaped HTML characters (to make sure that I didn’t get errors when trying to run it in Python), and removed extra white spaces between words.

OpenRefine also lets you use regular expressions, which is a kind of search tool for finding specific strings of characters inside other text. This allowed me to remove punctuation, hashtags, and author mentions by running a find and replace command.

  • Remove punctuation: grel:value.replace(/(\p{P}(?<!’)(?<!-))/, “”)
    • Any punctuation character is removed.
  • Remove users: grel:value.replace(/(@\S*)/, “”)
    • Any string that begins with an @ is removed. It ends at the space following the word.
  • Remove hashtags: grel:value.replace(/(#\S*)/,””)
    • Any string that begins with a # is removed. It ends at the space following the word.

Regular expressions, commonly abbreviated as “regex,” can take a little getting used to in order to understand how they work. Fortunately, OpenRefine itself has some solid documentation on the subject, and I also found this cheatsheet valuable as I was trying to get it work. If you want to create your own regex search strings, regex101.com has a tool that lets you test your expression before you actually deploy it in OpenRefine.

After downloading the entire data set as a Comma Separated Value (.csv) file, I then used the Natural Language ToolKit (NLTK) for Python to remove stop-words. The code itself can be found here, but I first saved the content of the tweets as a single text file, and then I told NLTK to go over every line of the document and remove words that are in its common stop word dictionary. The output is then saved in another text file, which is ready to be fed into a topic modeling package, such as MALLET.

At the end of all these cleaning steps, my resulting data is essentially composed of unique nouns and verbs, so, for example, @Phoenix_Rises13’s tweet “rt @drlawyercop since sensible, national gun control is a steep climb, how about we just start with orlando? #guncontrolnow” becomes instead “since sensible national gun control steep climb start orlando.” This means that the topic modeling will be more focused on the particular words present in each tweet, rather than commonalities of the English language.

Now my data is cleaned from any additional noise, and it is ready to be input into a topic modeling program.

Interested in working with topic models? There are two Savvy Researcher topic modeling workshops, on December 6 and December 8, that focus on the theory and practice of using topic models to answer questions in the humanities. I hope to see you there!

An Introduction to Traditional Knowledge Labels and Licenses

NOTE: While we are discussing matters relating to the law, this post is not meant as legal advice.

Overview

Fans of Mukurtu CMS, a digital archeology platform, as well as intellectual property nerds may already be familiar with Traditional Knowledge labels and licenses, but for everyone else here’s a quick introduction. Traditional Knowledge labels and licenses, were specifically created for researchers and artists working with or thinking of digitizing materials created by indigenous groups. Although created more educational, rather than legal value, these labels aim to allow indigenous groups to take back some control over their cultural heritage and to educate users about how to incorporate these digital heritage items in a more just and culturally sensitive way. The content that TK licenses and labels cover extends beyond digitized visual arts and design to recorded and written and oral histories and stories. TK licenses and labels are also a standard to consider when working with any cultural heritage created by marginalized communities. They also provide an interesting way to recognize ownership and the proper use of work that is in the public domain. These labels and licenses are administered by Local Contexts, an organization directed by Jane Anderson, a professor at New York University and Kim Christen, a professor at Washington State University. Local Contexts is dedicated to helping Native Americans and other indigenous groups gain recognition for, and control over, the way their intellectual property is used. This organization has received funding from sources including the National Endowment for Humanities, and the World Intellectual Property Organization.

Traditional knowledge, or TK, labels and licenses are a way to incorporate protocols for cultural practices into your humanities data management and presentation strategies. This is especially relevant because indigenous cultural heritage items are traditionally viewed by Western intellectual property laws as part of the public domain. And, of course, there is a long and troubling history of dehumanizing treatment of Native Americans by American institutions, as well as a lack of formal recognition of their cultural practices, which is only starting to be addressed. Things have been slowly improving; for example, the Native American Graves and Repatriation Act of 1990 was a law specifically created to address institutions, such as museums, which owned and displayed people’s relative’s remains and related funerary art without their permission or the permission of their surviving relatives (McManamon, 2000). The World Intellectual Property Organization’s Intergovernmental Committee on Intellectual Property and Genetic Resources, Traditional Knowledge and Folklore (IGC) has began to address and open up conversations about these issues in hopes of coming up with a more consistent legal framework for countries to work with; though, confusingly, most of what Traditional Knowledge labels and licenses apply to are considered “Traditional Cultural Expressions” by WIPO (“Frequently Asked Questions,” n.d.).

To see these labels and licenses in action, take a look at how how these are used is the Mira Canning Stock Route Project Archive from Australia (“Mira Canning Stock Route Project Archive,” n.d.).

The main difference between TK labels and licenses is that TK labels are an educational tool for suggested use with indigenous materials, whether or not they are legally owned by an indigenous community, while TK licenses are similar to Creative Commons licenses — though less recognized — and serve as a customizable supplement to traditional copyright law for materials owned by indigenous communities (“Does labeling change anything legally?,” n.d.).

The default types of TK licenses are: TK Education, TK Commercial, TK Attribution, TK Noncommercial.

Four proposed TK licenses

TK Licenses so far (“TK Licenses,” n.d.)

Each license and label, as well as a detailed description can be found on the Local Contexts site and information about each label is available in English, French, and Spanish.

The types of TK labels are: TK Family, TK Seasonal, TK Outreach, TK Verified, TK Attribution, TK Community Use Only, TK Secret/Sacred, TK Women General, TK Women Restricted, TK Men General, TK Men Restricted, TK Noncommercial, TK Commercial, TK Community Voice, TK Culturally Sensitive (“Traditional Knowledge (TK) Labels,” n.d.).

Example:

TK Women Restricted (TK WR) Label

A TK Women Restricted Label.

“This material has specific gender restrictions on access. It is regarded as important secret and/or ceremonial material that has community-based laws in relation to who can access it. Given its nature it is only to be accessed and used by authorized [and initiated] women in the community. If you are an external third party user and you have accessed this material, you are requested to not download, copy, remix or otherwise circulate this material to others. This material is not freely available within the community and it therefore should not be considered freely available outside the community. This label asks you to think about whether you should be using this material and to respect different cultural values and expectations about circulation and use.” (“TK Women Restricted (TK WR),” n.d.)

Wait, so is this a case where a publicly-funded institution is allowed to restrict content from certain users by gender and other protected categories?

The short answer is that this is not what these labels and licenses are used for. Local Contexts, Mukurtu, and many of the projects and universities associated with the Traditional Knowledge labels and licensing movement are publicly funded. From what I’ve seen, the restrictions are optional, especially for those outside the community (“Does labeling change anything legally?,” n.d.). It’s more a way to point out when something is meant only for members of a certain gender, or to be viewed during a time of year, than to actually restrict something only to members of a certain gender. In other words, the gender-based labels for example are meant for the type of self-censorship of viewing materials that is often found in archival spaces. That being said, some universities have what is called a Memorandum of Understanding between a university and an indigenous community, which involve universities agreeing to respect the Native American culture. The extent to which this goes for digitized cultural heritage held in university archives, for example, is unclear, though most Memorandum of Understanding are not legally binding (“What is a Memorandum of Understanding or Memorandum of Agreement?,” n.d.) . Overall, this raises lots of interesting questions about balancing conflicting views of intellectual property and access and public domain.

Works Cited:

Does labeling change anything legally? (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/project/does-labeling-change-anything-legally/
Frequently Asked Questions. (n.d.). Retrieved August 3, 2017, from http://www.wipo.int/tk/en/resources/faqs.html
McManamon, F. P. (2000). NPS Archeology Program: The Native American Graves Protection and Repatriation Act (NAGPRA). In L. Ellis (Ed.), Archaeological Method and Theory: An Encyclopedia. New York and London: Garland Publishing Co. Retrieved from https://www.nps.gov/archeology/tools/laws/nagpra.htm
Mira Canning Stock Route Project Archive. (n.d.). Retrieved August 3, 2017, from http://mira.canningstockrouteproject.com/
TK Licenses. (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/tk-licenses/
TK Women Restricted (TK WR). (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/tk/wr/1.0
What is a Memorandum of Understanding or Memorandum of Agreement? (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/project/what-is-a-memorandum-of-understandingagreement/

Further Reading:

Christen, K., Merrill, A., & Wynne, M. (2017). A Community of Relations: Mukurtu Hubs and Spokes. D-Lib Magazine, 23(5/6). https://doi.org/10.1045/may2017-christen
Educational Resources. (n.d.). Retrieved August 3, 2017, from http://www.localcontexts.org/educational-resources/
Lord, P. (n.d.). Unrepatriatable: Native American Intellectual Property and Museum Digital Publication. Retrieved from http://www.academia.edu/7770593/Unrepatriatable_Native_American_Intellectual_Property_and_Museum_Digital_Publication
Project Description. (n.d.). Retrieved August 3, 2017, from http://www.sfu.ca/ipinch/about/project-description/

Acknowledgements:

Thank you to the Rare Book and Manuscript Library and Melissa Salrin in the iSchool for helping me with my questions about indigenous and religious materials in archives and special collections at public institutions, you are the best!

Finding Digital Humanities Tools in 2017

Here at the Scholarly Commons we want to make sure our patrons know what options are out there for conducting and presenting their research. The digital humanities are becoming increasingly accepted and expected. In fact, you can even play an online game about creating a digital humanities center at a university. After a year of exploring a variety of digital humanities tools, one theme has emerged throughout: taking advantage of the capabilities of new technology to truly revolutionize scholarly communications is actually a really hard thing to do.  Please don’t lose sight of this.

Finding digital humanities tools can be quite challenging. To start, many of your options will be open source tools that you need a server and IT skills to run ($500+ per machine or a cloud with slightly less or comparable cost on the long term). Even when they aren’t expensive be prepared to find yourself in the command line or having to write code, even when a tool is advertised as beginner-friendly.

Mukurtu Help Page Screen Shot

I think this has been taken down because even they aren’t kidding themselves anymore.

There is also the issue of maintenance. While free and open source projects are where young computer nerds go to make a name for themselves, not every project is going to have the paid staff or organized and dedicated community to keep the project maintained over the years. What’s more, many digital humanities tool-building projects are often initiatives from humanists who don’t know what’s possible or what they are doing, with wildly vacillating amounts of grant money available at any given time. This is exacerbated by rapid technological changes, or the fact that many projects were created without sustainability or digital preservation in mind from the get-go. And finally, for digital humanists, failure is not considered a rite of passage to the extent it is in Silicon Valley, which is part of why sometimes you find projects that no longer work still listed as viable resources.

Finding Digital Humanities Tools Part 1: DiRT and TAPoR

Yes, we have talked about DiRT here on Commons Knowledge. Although the Digital Research Tools directory is an extensive resource full of useful reviews, over time it has increasingly become a graveyard of failed digital humanities projects (and sometimes randomly switches to Spanish). DiRT directory itself  comes from Project Bamboo, “… a  humanities cyber- infrastructure  initiative  funded  by  the  Andrew  W.  Mellon Foundation between 2008 and 2012, in order to enhance arts and humanities research through the development of infrastructure and support for shared technology services” (Dombrowski, 2014).  If you are confused about what that means, it’s okay, a lot of people were too, which led to many problems.

TAPoR 3, Text Analysis Portal for Research is DiRT’s Canadian counterpart, which also contains reviews of a variety of digital humanities tools, despite keeping text analysis in the name. Like DiRT, outdated sources are listed.

Part 2: Data Journalism, digital versions of your favorite disciplines, digital pedagogy, and other related fields.

A lot of data journalism tools crossover with digital humanities; in fact, there are even joint Digital Humanities and Data Journalism conferences! You may have even noticed how The Knight Foundation is to data journalism what the Mellon Foundation is to digital humanities. However, Journalism Tools and the list version on Medium from the Tow-Knight Center for Entrepreneurial Journalism at CUNY Graduate School of Journalism and the Resources page from Data Driven Journalism, an initiative from the European Journalism Centre and partially funded by the Dutch government, are both good places to look for resources. As with DiRT and TAPoR, there are similar issues with staying up-to-date. Also data journalism resources tend to list more proprietary tools.

Also, be sure to check out resources for “digital” + [insert humanities/social science discipline], such as digital archeology and digital history.  And of course, another subset of digital humanities is digital pedagogy, which focuses on using technology to augment educational experiences of both  K-12 and university students. A lot of tools and techniques developed for digital pedagogy can also be used outside the classroom for research and presentation purposes. However, even digital science resources can have a lot of useful tools if you are willing to scroll past an occasional plasmid sharing platform. Just remember to be creative and try to think of other disciplines tackling similar issues to what you are trying to do in their research!

Part 3: There is a lot of out-of-date advice out there.

There are librarians who write overviews of digital humanities tools and don’t bother test to see if they still work or are still updated. I am very aware of how hard things are to use and how quickly things change, and I’m not at all talking about the people who couldn’t keep their websites and curated lists updated. Rather, I’m talking about, how the “Top Tools for Digital Humanities Research” in the January/February 2017  issue of “Computers in Libraries” mentions Sophie, an interactive eBook creator  (Herther, 2017). However, Sophie has not updated since 2011 and the link for the fully open source version goes to “Watch King Kong 2 for Free”.

Screenshot of announcement for 2010 Sophie workshop at Scholarly Commons

Looks like we all missed the Scholarly Commons Sophie workshop by only 7 years.

The fact that no one caught that error either shows either how slowly magazines edit, or that no one else bothered check. If no one seems to have created any projects with the software in the past three years it’s probably best to assume it’s no longer happening; though, the best route is to always check for yourself.

Long term solutions:

Save your work in other formats for long term storage. Take your data management and digital preservation seriously. We have resources that can help you find the best options for saving your research.

If you are serious about digital humanities you should really consider learning to code. We have a lot of resources for teaching yourself these skills here at the Scholarly Commons, as well as a wide range of workshops during the school year. As far as coding languages, HTML/CSS, Javascript, Python are probably the most widely-used tools in the digital humanities, and the most helpful. Depending on how much time you put into this, learning to code can help you troubleshoot and customize your tools, as well as allow you contribute to and help maintain the open source projects that you care about.

Works Cited:

100 tools for investigative journalists. (2016). Retrieved May 18, 2017, from https://medium.com/@Journalism2ls/75-tools-for-investigative-journalists-7df8b151db35

Center for Digital Scholarship Portal Mukurtu CMS.  (2017). Support. Retrieved May 11, 2017 from http://support.mukurtu.org/?b_id=633

DiRT Directory. (2015). Retrieved May 18, 2017 from http://dirtdirectory.org/

Digital tools for researchers. (2012, November 18). Retrieved May 31, 2017, from http://connectedresearchers.com/online-tools-for-researchers/

Dombrowski, Q. (2014). What Ever Happened to Project Bamboo? Literary and Linguistic Computing. https://doi.org/10.1093/llc/fqu026

Herther, N.K. (2017). Top Tools for Digital Humanities Research. Retrieved May 18, 2017, from http://www.infotoday.com/cilmag/jan17/Herther–Top-Tools-for-Digital-Humanities-Research.shtml

Journalism Tools. (2016). Retrieved May 18, 2017 from http://journalismtools.io/

Lord, G., Nieves, A.D., and Simons, J. (2015). dhQuest. http://dhquest.com/

Resources Data Driven Journalism. (2017). Retrieved May 18, 2017, from http://datadrivenjournalism.net/resources
TAPoR 3. (2015). Retrieved May 18, 2017 from http://tapor.ca/home

Visel, D. (2010). Upcoming Sophie Workshops. Retrieved May 18, 2017, from http://sophie2.org/trac/blog/upcomingsophieworkshops

Neatline 101: Getting Started

Here at Commons Knowledge we love easy-to-use interactive map creation software! We’ve compared and contrasted different tools, and talked about StoryMap JS and Shanti Interactive. The Scholarly Commons is a great place to get help on GIS projects, from ArcGIS StoryMaps and beyond. But if you want something where you can have both a map and a timeline, and if you are willing to spend money on your own server, definitely consider using Neatline.

Neatline is a plugin created by Scholar’s Lab at University of Virginia that lets you create interactive maps and timelines in Omeka exhibits. My personal favorite example is the demo site by Paul Mawyer “‘I am it and it is I’: Lovecraft in Providence” with the map tiles from Stamen Design under CC-BY 3.0 license.

Screenshot of Lovecraft Neatline exhibit

*As far as the location of Lovecraft’s most famous creation, let’s just say “Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn.”

Now one caveat — Neatline requires a server. I used Reclaim Hosting which is straightforward, and which I have used for Scalar and Mukurtu. The cheapest plan available on Reclaim Hosting was $32 a year. Once I signed up for the website and domain name, I took advantage of one nice feature of Reclaim Hosting, which lets you one-click install the Omeka.org content management system (CMS). The Omeka CMS is a popular choice for digital humanities users. Other popular content management systems include Wordpress and Scalar.

One click install of Omeka through Reclaim Hosting

BUT WAIT, WHAT ABOUT OMEKA THROUGH SCHOLARLY COMMONS?

Here at the Scholarly Commons we can set up an Omeka.net site for you. You can find more information on setting up an Omeka.net site through the Scholarly Commons here. This is a great option for people who want to create a regular Omeka exhibit. However, Neatline is only available as a plugin on Omeka.org, which needs a server to host. As far as I know, there is currently no Neatline plugin for Omeka.net and I don’t think that will be happening anytime soon. On Reclaim you can install Omeka on any LAMP server. And side advice from your very forgetful blogger, write down whatever username and password you make up when you set up your Omeka site, that will save you a lot of trouble later, especially considering how many accounts you end up with when you use a server to host a site.

Okay, I’m still interested, but what do I do once I have Omeka.org installed? 

So back to the demo. I used the instructions on the documentation page on Neatline, which were good for defining a lot of the terms but not so good at explaining exactly what to do. I am focusing on the original Neatline plugin but there are other Neatline plugins like NeatlineText depending on your needs. However all plugins are installed in a similar way. You can follow the official instructions here at Installing Neatline.

But I have also provided some because the official instructions just didn’t do it for me.

So first off, download the Neatline zip file.

Go to your Control Panel, cPanel in Reclaim Hosting, and click on “File Manager.”

File Manager circled on Reclaim Hosting

Sorry this looks so goofy, Windows snipping tool free form is only for those with a steady hand.

Navigate to the the Plugins folder.

arrow points at plugins folder in file manager

Double click to open the folder. Click Upload Files.

more arrows pointing at tiny upload option in Plugins folder

If you’re using Reclaim Hosting, IGNORE THE INSTRUCTIONS DO NOT UNZIP THE ZIP FILE ON YOUR COMPUTER JUST PLOP THAT PUPPY RIGHT INTO YOUR PLUGINS FOLDER.

Upload the entire zip file

                      Plop it in!

Go back to the Plugins folder. Right click the Neatline zip file and click extract. Save extracted files in Plugins.

Extract Neatline files in File Manager

Sign into your Omeka site at [yourdomainname].[com/name/whatever]/admin if you aren’t already.

Omeka dashboard with arrows pointing at Plugins

Install Neatline for real.

Omeka Plugins page

Still confused or having trouble with setup?

Check out these tutorials as well!

Open Street Maps is great and all but what if I want to create a fancy historical map?

To create historical maps on Neatline you have two options, only one of which is included in the actual documentation for Neatline.

Officially, you are supposed to use GeoServer. GeoServer is an open source server application built in Java. Even if you have your own server, it has a lot more dependencies to run than what’s required for Omeka / Neatline.

If you want one-click Neatline installation with GeoServer and have money to spend you might want to check out AcuGIS Neatline Cloud Hosting which is recommended in the Neatline documentation and the lowest cost plan starts at $250 a year.

Unofficially, there is a tutorial for this available at Lincoln Mullen’s blog “The Backward Glance” specifically his 2015 post “How to Use Neatline with Map Warper Instead of Geoserver.”

Let us know about the ways you incorporate geospatial data in your research!  And stay tuned for Neatline 102: Creating a simple exhibit!

Works Cited:

Extending Omeka with Plugins. (2016, July 5). Retrieved May 23, 2017, from http://history2016.doingdh.org/week-1-wednesday/extending-omeka-with-plugins/

Installing Neatline Neatline Documentation. (n.d.). Retrieved May 23, 2017, from http://docs.neatline.org/installing-neatline.html

Mawyer, Paul. (n.d.). “I am it and it is I”: Lovecraft in Providence. Retrieved May 23, 2017, from http://lovecraft.neatline.org/neatline-exhibits/show/lovecraft-in-providence/fullscreen

Mullen, Lincoln. (2015).  “How to Use Neatline with Map Warper Instead of Geoserver.” Retrieved May 23, 2017 from http://lincolnmullen.com/blog/how-to-use-neatline-with-map-warper-instead-of-geoserver/

Uploading Plugins to Omeka. (n.d.). Retrieved May 23, 2017, from https://community.reclaimhosting.com/t/uploading-plugins-to-omeka/195

Working with Omeka. (n.d.). Retrieved May 23, 2017, from https://community.reclaimhosting.com/t/working-with-omeka/194

Adventures at the Spring 2017 Library Hackathon

This year I participated in an event called HackCulture: A Hackathon for the Humanities, which was organized by the University Library. This interdisciplinary hackathon brought together participants and judges from a variety of fields.

This event is different than your average campus hackathon. For one, it’s about expanding humanities knowledge. In this event, teams of undergraduate and graduate students — typically affiliated with the iSchool in some way — spend a few weeks working on data-driven projects related to humanities research topics. This year, in celebration of the sesquicentennial of the University of Illinois at Urbana-Champaign, we looked at data about a variety of facets of university life provided by the University Archives.

This was a good experience. We got firsthand experience working with data; though my teammates and I struggled with OpenRefine and so we ended up coding data by hand. I now way too much about the majors that are available at UIUC and how many majors have only come into existence in the last thirty years. It is always cool to see how much has changed and how much has stayed the same.

The other big challenge we had was not everyone on the team had experience with design, and trying to convince folks not to fall into certain traps was tricky.

For an idea of how our group functioned, I outlined how we were feeling during the various checkpoints across the process.

Opening:

We had grand plans and great dreams and all kinds of data to work with. How young and naive we were.

Midpoint Check:

Laura was working on the Python script and sent a well-timed email about what was and wasn’t possible to get done in the time we were given. I find public speaking challenging so that was not my favorite workshop. I would say it went alright.

Final:

We prevailed and presented something that worked in public. Laura wrote a great Python script and cleaned up a lot of the data. You can even find it here. One day in the near future it will be in IDEALS as well where you can already check out projects from our fellow humanities hackers.

Key takeaways:

  • Choose your teammates wisely; try to pick a team of folks you’ve worked with in advance. Working with a mix of new and not-so-new people in a short time frame is hard.
  • Talk to your potential client base! This was definitely something we should have done more of.
  • Go to workshops and ask for help. I wish we had asked for more help.
  • Practicing your presentation in advance as well as usability testing is key. Yes, using the actual Usability Lab at Scholarly Commons is ideal but at the very least take time to make sure the instructions for using what you created are accurate. It’s amazing what steps you will leave off when you have used an app more than twice. Similarly make sure that you can run your program and another program at the same time because if you can’t chances are it means you might crash someone’s browser when they use it.

Overall, if you get a chance to participate in a library hackathon, go for it, it’s a great way to do a cool project and get more experience working with data!

TiddlyWiki Review

Here at Commons Knowledge we like to talk about all of the various options out there for personal and information management tools, so today we’re talking about TiddlyWiki!

“It’s like a hypertext card index system from the future” -Jeremy Ruston, in the TiddlyWiki intro video

To summarize: this is a British, somewhat tricky to use, free and open source note taking and information management linked web wiki platform made in Javascript. TiddlyWiki is mostly used for task management. Still, if you’re looking for a way to manage all of your information and feeling particularly adventurous (and not at all into aesthetics, as TiddlyWiki is an ugly website — though CSS customization is possible) you might enjoy TiddlyWiki.

Everything in TiddlyWiki is a small piece, a tiddler —  a British word for a small fish — which you can stack, arrange, and link however you like. Tiddlers are individual units that you can incorporate into larger tiddlers through a process called “transclusion.” To have a tiddler all you need is a title. This is very similar to Scalar CMS where all content is equal, and can be linked or embedded in each other to tell both linear and nonlinear stories. However, TiddlyWiki is not as pretty and is focused more on note-taking and information management than presentation.

An example of a Tiddler

There are a lot of options for customization, as well as an active community that keeps the project alive and adds new customization options for different purposes (such as for writing a thesis). There is a WYSIWYG editor and formatting options, though you will still need to become familiar with the WikiText language in order to use more interesting formatting and customization. The WikiText language is similar to Markdown. There is also a plugin that will let you write your tiddlers in Markdown if you are more familiar and comfortable with that. You can add images and scribble all over them, as well as save links to websites with a download and some difficulty. TiddlyWiki includes search functionality and tagging, which is especially useful, as you can click on a tag you get a list of pages that have that tag. There are encryption plugins, which I have not tested, to create password-protected tiddlers and offer some basic security (though neither I nor the creators of TiddlyWiki endorse putting sensitive information on one of these sites).

You can use TiddlyWiki with TiddlySpot, Tiddly Desktop, or various browsers as well as node.js or a variety of other options for saving the program. Get started here.

Setting up where your files save so you can find them again is probably the hardest part of setting up a TiddlyWiki. It creates one HTML file that you update as you save. If you’re using Firefox and using the Firefox plugin I recommend downloading an empty wiki and copying it from your Downloads and pasting it to your G:Drive or another place where files aren’t deleted automatically. After, you can click on the cat icon and set it to automatically save your changes to your file on the Desktop.

Clicking on

Note: Don’t save things to the Desktop on Scholarly Commons computers long-term, as files are routinely erased.

Let us know in the comments if you have any other personal information management systems that need more love!

Writing the next great American novel, or realistically, finding the “write” tools to finish your thesis

The Scholarly Commons is a great place to write the next great American novel; in fact, I’m surprised it has not happened yet (no pressure dear patrons — we understand that you have a lot on your plates). We’re open Monday-Friday from 9-6 and enjoy a well-lit, fairly quiet, and overall ideal working space, with Espresso Royale and the Writing Center nearby. But actually getting that writing done, that’s the real challenge. Luckily, we have suggestions for tools and software you can use to keep writing and stay on track this semester!

Writing Your First Draft:

Yes, MS Word can be accessed for free for University students through the Web Store and you can set it up to better address your research needs with features like the Zotero and Mendeley plugins to incorporate your references. And don’t forget you can go to Word > File > Options > Proofing > Writing Style and select Grammar and Style and Settings to set what Spellcheck will check for so that passive voice gets underline. However, believe it or not, there are word processors, other than MS Word, that are better for organizing and creating large writing projects, such as novels, theses, or even plays!

Scrivener

Scrivener is a word processor created with novelists in mind that lets you organize your research and notes while you are writing. With an education discount, a license for Scrivener costs $38.25. Scrivener is very popular and highly recommended by two of the GAs here at Scholarly Commons (you can email Claire Berman with any questions you may have about the software at cberman2 [at] illinois.edu). To really get started, check out our online copies of Scrivener: An Absolute Beginner’s Guide and  Scrivener for Dummies!

Mellel

Unfortunately, Mellel is only available on Mac. An educational license for the software costs $29. To some extent Mellel is similar in style and price to Pages for Mac, but also shares similarities with MS Word for Mac. However, this word processor offers more options for customizing your word processing experience than Pages or MS Word. It also provides more options for outlining your work and dividing sections in a way that even MS Word Notebook version does not, which is great if you have a large written work with many sections, such as a novel or a thesis! Mellel also partners with the citation managers Bookends and Sente.

Markdown Editors like Ulysses

Ulysses is a simple and straightforward word processor for Mac, but you do have to write in Markdown without a WYSIWYG editor. It costs $44.99 for Mac and $24.99 for iOS. However, it has many great features for writers (such as built in word count writing goals for sections of a paper, and Markdown makes outlining work very easy and simple). We have discussed the value and importance of Markdown elsewhere on the blog before, specifically in our posts Digital Preservation and the Power of Markdown and Getting Started with Markdown, and of course, want to remind all of our lovely readers to consider doing their writing in Markdown. Learning Markdown can open up writing and digital publishing opportunities across the web (for example: Programming Historian tutorials are written in Markdown). Plus, writing in Markdown converts easily for simple web design without the headache of having to write in HTML.

Staying Focused:

Maybe you don’t want to buy a whole new word processor. That’s fine! Here are some tools that can help creating the “write” environment to get work done:

Freedom : costs $2.50 a month, so Freedom is not free, indeed. This is an an app that allows you to block websites and even the internet, available for Mac, Windows, iOS devices. This app also has a lock feature that will not allow you to make changes to what is blocked for a set period of time.

RescueTime : another app option. Taking a slightly different approach to the rest here, the lite version of this app helps you track how you use your time and what apps and websites you use the most so that you can have a better sense of what you are doing instead of writing. The premium version, which costs $54 a year, allows you to block distracting websites.

SelfControl: a Mac option but Open Source, with community built Linux and PC versions, and most importantly it’s free! This app allows you to block websites, based on their server, for a set period of time, in which there is basically NOTHING you can do on your computer to access these sites. So choose which sites to block and the time limit wisely.

Editing Tools:

Hemingway

Named after Ernest Hemingway, this text editor is supposed to help you adapt his style of writing, “bold and clear.” When you paste your text into the free web version, the applet gives you the text’s reading level as well as pointing out instances of awkward grammar, unnecessary or complicated words and adverbs, and sentences that are too long or too complicated.There’s a Desktop version available for $20 though I honestly don’t think it’s worth the money, though it does give another simple space on your computer to write and get feedback.

A note about Grammarly 

This is an alternative to MS Word spell check with a free version to add to your browser. As a browser add-in, it checks automatically for critical spelling and grammar mistakes (advanced ones cost a monthly fee) everywhere you write except situations where you’d really want extra spell check such as Google Docs and can be wonky with WordPress. You can always copy and paste into the Grammarly window, but at that point, you’re probably better doing spell check in MS Word. There are also only two versions of English available, American and British (take that Australia!). If you are trying to learn English and want instantaneous feedback while writing on the internet, or studying for high school standardized tests, or perhaps a frequent YouTube commenter in need of a quick check before posting, then Grammarly is for you. For most people at Scholarly Commons, this is a plugin they can skip, though I can’t speak for the paid version which is supposed to be a little bit better. If you uninstall the app they try to guilt trip you, so heads up.

SpellCheckPlus: It’s BonPatron in English! Brought to you by Nadaclair Language Technologies, this web-based text editor goes beyond MS Word’s spellcheck to help identify grammar errors and ways to make your writing sound more normal to a native (Canadian) English speaker. There is a version that costs money but if you don’t import more than the allotted 250 words of text at one time you will be fine using the free version.

Let us know what you think and any tools we may have missed! Happy writing!

And to learn more and find more great productivity tools, check out:

Personal Information Management LibGuide

Use Sifter for Twitter Research

For many academics, Twitter is an increasingly important source. Whether you love it or hate it, Twitter dominates information dissemination and discourse, and will continue to do so for the foreseeable future. However, actually sorting through Twitter — especially for large-scale projects — can be deceptively difficult, and a deterrent for would-be Twitter scholars. That is why Sifter will go through Twitter for you.

Sifter is a paid service — which will be discussed in greater detail below — which provides search and retrieve access for undeleted Tweets. Retrieved tweets are stored in an Enterprise DiscoverText account, which allows the user to perform data analytics on the Tweets. The DiscoverText account will be part of a fourteen day free trial, but for prolonged use the user will have to pay for account access.

However, Sifter can become prohibitively expensive. Each user can get three free estimates a day. Following that, it is $20 per day of data retrieval and $30 per 100,000 Tweets. Some more expensive purchases (over $500 and $1500, respectively) will receive longer DiscoverText trials with access added for additional users. There are no refunds. So prior to making your purchase, make sure that you have done enough research to know exactly what data you want, and which filters you’d like to use.

Possible filters that you can request when using Sifter.

Have you used Sifter? Or DiscoverText? What was your experience like? Alternatively, do you have a free resource that you prefer to use for Twitter data analytics? Please let us know in the comments!

Topic Modeling and the Future of Ebooks

Ebook by Daniel Sancho CC BY 2.0

This semester I’ve had the pleasure of taking a course on Issues in Scholarly Communication with Dr. Maria Bonn at the University of Illinois iSchool. While we’ve touched on a number of fascinating issues in this course, I’ve been particularly interested in JSTOR Labs’ Reimagining the Monograph Project.

This project was inspired by the observation that, while scholarly journal articles have been available in digital form for some time now, scholarly books are now just beginning to become available in this format. Nevertheless, the nature of long form arguments, that is, the kinds of arguments you find in books, differs in some important ways from the sorts of materials you’ll find in journal articles. Moreover, the ways that scholars and researchers engage with books are often different from the ways in which they interact with papers. In light of this, JSTOR Labs has spearheaded an effort to better understand the different ways that scholarly books are used, with an eye towards developing digital monographs that better suit these uses.

Topicgraph logo

In pursuit of this project, the JSTOR Labs team created Topicgraph, a tool that allows researchers to see, at a glance, what topics are covered within a monograph. Users can also navigate directly to pages that cover the topics in which they are interested. While Topicgraph is presented as a beta level tool, it provides us with a clear example of the untapped potential of digital books.

A topic graph for Suburban Urbanites

Topicgraph uses a method called topic modeling, which is used in natural language processing. Topic modeling will examine text, and then create different topics that are discussed in that text based on the terms being used. Terms that are used in proximity to one another at a frequent rate are thought to serve as an indicator that various topics are being discussed.

Users can explore Topicgraph by using JSTOR Labs’ small collection of open access scholarly books that span a number of different disciplines, or by by uploading their own PDFs for Topicgraph to analyze.

If you would like to learn how to incorporate topic modeling or other forms of text analysis into your research, contact the Scholarly Commons or visit us in the Main Library, room 306.