For the transcript, click on “Continue reading” below.
For the transcript, click on “Continue reading” below.
Did you know that Endangered Data Week is happening from February 26-March 2? Endangered Data Week is a collaborative effort to help highlight on public datasets that are in danger of being deleted, repressed, mishandled, or lost. Inspired by recent events that have shown how fragile publicly administered data is, Endangered Data Week hopes to promote care for endangered collections by publicizing datasets and increasing engagement with them, and through advocating for political activism.
The Endangered Data Week organizes hope to cultivate a broad community of supporters for access to public data, and who advocate for open data policies and help cultivate data skills and competencies among students and colleagues. During Endangered Data Week, librarians, scholars and activists will use the #EndangeredData Twitter hashtag, as well as host events across the country.
While this is the first year of Endangered Data Week, the organizers hope to work both on the momentum of similar movements, such as Sunshine Week, Open Access Week, and the #DataRescue, and to continue organizing events into the future.
What are you doing during Endangered Data Week? Let us know in the comments!
If you’re working with data, chances are, there will be at least a few times where you encounter the “nightmare scenario”. Things go awry — values are missing, your sample is biased, there are inexplicable outliers, or the sample wasn’t as random as you thought. Some issues you can solve, other issues are less clear. But before you tear your hair out — or, before you tear all of your hair out — check out The Quartz guide to bad data. Hosted on GitHub,The Quartz guide lists out possible problems with data, and how to solve them, so that researchers have an idea of what next-steps can be when their data doesn’t work as planned.
With translations into six languages and a CreativeCommons 4.0 license, The Quartz guide divides problems into four categories: issues that your source should solve, issues that you should solve, issues a third-party expert should help you solve, and issues a programmer should help you solve. From there, the guide lists specific issues and explains how they can or cannot be solved.
One of the greatest things about The Quartz guide is the language. Rather than pontificating and making an already frustrating problem more confusing, the guide lays out options in plain terms. While you may not get everything you need for fixing your specific problem, chances are you will at least figure out how you can start moving forward after this setback.
The Quartz guide does not mince words. For example, in the “Data were entered by humans” example, it gives an example of messy data entry then says, “Even with the best tools available, data this messy can’t be saved. They are effectively meaningless… Beware human-entered data.” Even if it’s probably not what a researcher wants to hear, sometimes the hard, cold truth can lead someone to a new step in their research.
So if you’ve hit a block with your data, check out The Quartz guide. It may be the thing that will help you move forward with your data! And if you’re working with data, feel free to contact the Scholarly Commons or Research Data Service with your questions!
This post was guest authored by Kayla Abner.
Interested in social media analytics, but don’t want to shell out the bucks to get started? There are a few open source tools you can use to dabble in this field, and some even integrate data visualization. Recently, we at the Scholarly Commons tested a few of these tools, and as expected, each one has strengths and weaknesses. For our exploration, we exclusively analyzed Twitter data.
tl;dr: Light system footprint and provides some interesting data visualization options. Useful if you don’t have a pre-existing data set, but the one generated here is fairly small.
NodeXL is essentially a complex Excel template (it’s classified as a Microsoft Office customization), which means it doesn’t take up a lot of space on your hard drive. It does have advantages; it’s easy to use, only requiring a simple search to retrieve tweets for you to analyze. However, its capabilities for large-scale analysis are limited; the user is restricted to retrieving the most recent 2,000 tweets. For example, searching Twitter for #halloween imported 2,000 tweets, every single one from the date of this writing. It is worth mentioning that there is a fancy, paid version that will expand your limit to 18,000, the maximum allowed by Twitter’s API, or 7 to 8 days ago, whichever comes first. Even then, you cannot restrict your data retrieval by date. NodeXL is a tool that would mostly be most successful in pulling recent social media data. In addition, if you want to study something besides Twitter, you will have to pay to get any other type of dataset, i.e., Facebook, Youtube, Flickr.
Strengths: Good for a beginner, differentiates between Mentions/Retweets and original Tweets, provides a dataset, some light data visualization tools, offers Help hints on hover
Weaknesses: 2,000 Tweet limit, free version restricted to Twitter Search Network
tl;dr: Add-on for Google Sheets, giving it a light system footprint as well. Higher restriction for number of tweets. TAGS has the added benefit of automated data retrieval, so you can track trends over time. Data visualization tool in beta, needs more development.
TAGS is another complex spreadsheet template, this time created for use with Google Sheets. TAGS does not have a paid version with more social media options; it can only be used for Twitter analysis. However, it does not have the same tweet retrieval limit as NodeXL. The only limit is 18,000 or seven days ago, which is dictated by Twitter’s Terms of Service, not the creators of this tool. My same search for #halloween with a limit set at 10,000 retrieved 9,902 tweets within the past seven days.
TAGS also offers a data visualization tool, TAGSExplorer, that is promising but still needs work to realize its potential. As it stands now in beta mode, even a dataset of 2,000 records puts so much strain on the program that it cannot keep up with the user. It can be used with smaller datasets, but still needs work. It does offer a few interesting additional analysis parameters that NodeXL lacked, such as ability to see Top Tweeters and Top Hashtags, which works better than the graph.
Strengths: More data fields, such as the user’s follower and friend count, location, and language (if available), better advanced search (Boolean capabilities, restrict by date or follower count), automated data retrieval
Weaknesses: data visualization tool needs work
tl;dr: A tool used for “re-hydrating” tweet IDs into full tweets, to comply with Twitter’s Terms of Service. Not used for data analysis; useful for retrieving large datasets. Limited to datasets already available.
Documenting the Now, a group focused on collecting and preserving digital content, created the Hydrator tool to comply with Twitter’s Terms of Service. Download and distribution of full tweets to third parties is not allowed, but distribution of tweet IDs is allowed. The organization manages a Tweet Catalog with files that can be downloaded and run through the Hydrator to view the full Tweet. Researchers are also invited to submit their own dataset of Tweet IDs, but this requires use of other software to download them. This tool does not offer any data visualization, but is useful for studying and sharing large datasets (the file for the 115th US Congress contains 1,430,133 tweets!). Researchers are limited to what has already been collected, but multiple organizations provide publicly downloadable tweet ID datasets, such as Harvard’s Dataverse. Note that the rate of hydration is also limited by Twitter’s API, and the Hydrator tool manages that for you. Some of these datasets contain millions of tweet IDs, and will take days to be transformed into full tweets.
Strengths: Provides full tweets for analysis, straightforward interface
Weaknesses: No data analysis tools
If you’re looking for more robust analytics tools, Crimson Hexagon is a data analytics platform that specializes in social media. Not limited to Twitter, it can retrieve data from Facebook, Instagram, Youtube, and basically any other online source, like blogs or forums. The company has a partnership with Twitter and pays for greater access to their data, giving the researcher higher download limits and a longer time range than they would receive from either NodeXL or TAGS. One can access tweets starting from Twitter’s inception, but these features cost money! The University of Illinois at Urbana-Champaign is one such entity paying for this platform, so researchers affiliated with our university can request access. One of the Scholarly Commons interns, Matt Pitchford, uses this tool in his research on Twitter response to terrorism.
Whether you’re an experienced text analyst or just want to play around, these open source tools are worth considering for different uses, all without you spending a dime.
If you’d like to know more, researcher Rebekah K. Tromble recently gave a lecture at the Data Scientist Training for Librarians (DST4L) conference regarding how different (paid) platforms influence or bias analyses of social media data. As you start a real project analyzing social media, you’ll want to know how the data you have gathered may be limited to adjust your analysis accordingly.
We’ve all been there. You’ve been searching for a file for an hour, sure that you named it ‘draft 2.docx’ or ‘essay.docx’ or ‘FINAL DRAFT I SWEAR.docx’. There’s an hour until your deadline and the print queue is pretty backed up and you cannot find the file.
Again, we’ve all been there. But we don’t have to be.
Creating a naming convention for your files can save you the hassle of searching through files of ‘essay’s and ‘draft’s. Instead, you’ll be able to find files with ease. While everyone should create a system that works for them, here are a few suggestions to think about before choosing a system to name your files.
Naming conventions are only useful if they actually help you find what you’re looking for. So, create a naming convention that works for how you think about your files! For example, if you’re working with lab data that you save daily, create a system based on the date so your files will be in chronological order.
If you know that you’re not going to want to type out long file names, then don’t choose long file names. Or, if you know that a format will be more difficult for you in the long run, don’t use it in the short run! There are few things more irritating than having to go through and change things because you’ve created a system that’s too complicated.
This is something that I’ve had trouble with — if your system stop working, don’t be afraid to change it up to make things work for you. If your file names are getting too long, or you’re finding that you have trouble differentiating between dates, save yourself a headache by investing some time in creating another style sooner rather than later. That’s not to say that you should go changing all your file names willy-nilly whenever the mood strikes you, but it’s important that you find a way that you can commit to long term.
If you’re inspired and want to create a new system for naming your files, here are a few resources that you should check out:
Here at the Scholarly Commons we want to make sure our patrons know what options are out there for conducting and presenting their research. The digital humanities are becoming increasingly accepted and expected. In fact, you can even play an online game about creating a digital humanities center at a university. After a year of exploring a variety of digital humanities tools, one theme has emerged throughout: taking advantage of the capabilities of new technology to truly revolutionize scholarly communications is actually a really hard thing to do. Please don’t lose sight of this.
Finding digital humanities tools can be quite challenging. To start, many of your options will be open source tools that you need a server and IT skills to run ($500+ per machine or a cloud with slightly less or comparable cost on the long term). Even when they aren’t expensive be prepared to find yourself in the command line or having to write code, even when a tool is advertised as beginner-friendly.
I think this has been taken down because even they aren’t kidding themselves anymore.
There is also the issue of maintenance. While free and open source projects are where young computer nerds go to make a name for themselves, not every project is going to have the paid staff or organized and dedicated community to keep the project maintained over the years. What’s more, many digital humanities tool-building projects are often initiatives from humanists who don’t know what’s possible or what they are doing, with wildly vacillating amounts of grant money available at any given time. This is exacerbated by rapid technological changes, or the fact that many projects were created without sustainability or digital preservation in mind from the get-go. And finally, for digital humanists, failure is not considered a rite of passage to the extent it is in Silicon Valley, which is part of why sometimes you find projects that no longer work still listed as viable resources.
Finding Digital Humanities Tools Part 1: DiRT and TAPoR
Yes, we have talked about DiRT here on Commons Knowledge. Although the Digital Research Tools directory is an extensive resource full of useful reviews, over time it has increasingly become a graveyard of failed digital humanities projects (and sometimes randomly switches to Spanish). DiRT directory itself comes from Project Bamboo, “… a humanities cyber- infrastructure initiative funded by the Andrew W. Mellon Foundation between 2008 and 2012, in order to enhance arts and humanities research through the development of infrastructure and support for shared technology services” (Dombrowski, 2014). If you are confused about what that means, it’s okay, a lot of people were too, which led to many problems.
TAPoR 3, Text Analysis Portal for Research is DiRT’s Canadian counterpart, which also contains reviews of a variety of digital humanities tools, despite keeping text analysis in the name. Like DiRT, outdated sources are listed.
Part 2: Data Journalism, digital versions of your favorite disciplines, digital pedagogy, and other related fields.
A lot of data journalism tools crossover with digital humanities; in fact, there are even joint Digital Humanities and Data Journalism conferences! You may have even noticed how The Knight Foundation is to data journalism what the Mellon Foundation is to digital humanities. However, Journalism Tools and the list version on Medium from the Tow-Knight Center for Entrepreneurial Journalism at CUNY Graduate School of Journalism and the Resources page from Data Driven Journalism, an initiative from the European Journalism Centre and partially funded by the Dutch government, are both good places to look for resources. As with DiRT and TAPoR, there are similar issues with staying up-to-date. Also data journalism resources tend to list more proprietary tools.
Also, be sure to check out resources for “digital” + [insert humanities/social science discipline], such as digital archeology and digital history. And of course, another subset of digital humanities is digital pedagogy, which focuses on using technology to augment educational experiences of both K-12 and university students. A lot of tools and techniques developed for digital pedagogy can also be used outside the classroom for research and presentation purposes. However, even digital science resources can have a lot of useful tools if you are willing to scroll past an occasional plasmid sharing platform. Just remember to be creative and try to think of other disciplines tackling similar issues to what you are trying to do in their research!
Part 3: There is a lot of out-of-date advice out there.
There are librarians who write overviews of digital humanities tools and don’t bother test to see if they still work or are still updated. I am very aware of how hard things are to use and how quickly things change, and I’m not at all talking about the people who couldn’t keep their websites and curated lists updated. Rather, I’m talking about, how the “Top Tools for Digital Humanities Research” in the January/February 2017 issue of “Computers in Libraries” mentions Sophie, an interactive eBook creator (Herther, 2017). However, Sophie has not updated since 2011 and the link for the fully open source version goes to “Watch King Kong 2 for Free”.
Looks like we all missed the Scholarly Commons Sophie workshop by only 7 years.
The fact that no one caught that error either shows either how slowly magazines edit, or that no one else bothered check. If no one seems to have created any projects with the software in the past three years it’s probably best to assume it’s no longer happening; though, the best route is to always check for yourself.
Long term solutions:
Save your work in other formats for long term storage. Take your data management and digital preservation seriously. We have resources that can help you find the best options for saving your research.
100 tools for investigative journalists. (2016). Retrieved May 18, 2017, from https://medium.com/@Journalism2ls/75-tools-for-investigative-journalists-7df8b151db35
Center for Digital Scholarship Portal Mukurtu CMS. (2017). Support. Retrieved May 11, 2017 from http://support.mukurtu.org/?b_id=633
DiRT Directory. (2015). Retrieved May 18, 2017 from http://dirtdirectory.org/
Dombrowski, Q. (2014). What Ever Happened to Project Bamboo? Literary and Linguistic Computing. https://doi.org/10.1093/llc/fqu026
Herther, N.K. (2017). Top Tools for Digital Humanities Research. Retrieved May 18, 2017, from http://www.infotoday.com/cilmag/jan17/Herther–Top-Tools-for-Digital-Humanities-Research.shtml
Journalism Tools. (2016). Retrieved May 18, 2017 from http://journalismtools.io/
Lord, G., Nieves, A.D., and Simons, J. (2015). dhQuest. http://dhquest.com/
Visel, D. (2010). Upcoming Sophie Workshops. Retrieved May 18, 2017, from http://sophie2.org/trac/blog/upcomingsophieworkshops
Here at Commons Knowledge we like to talk about all of the various options out there for personal and information management tools, so today we’re talking about TiddlyWiki!
“It’s like a hypertext card index system from the future” -Jeremy Ruston, in the TiddlyWiki intro video
Everything in TiddlyWiki is a small piece, a tiddler — a British word for a small fish — which you can stack, arrange, and link however you like. Tiddlers are individual units that you can incorporate into larger tiddlers through a process called “transclusion.” To have a tiddler all you need is a title. This is very similar to Scalar CMS where all content is equal, and can be linked or embedded in each other to tell both linear and nonlinear stories. However, TiddlyWiki is not as pretty and is focused more on note-taking and information management than presentation.
There are a lot of options for customization, as well as an active community that keeps the project alive and adds new customization options for different purposes (such as for writing a thesis). There is a WYSIWYG editor and formatting options, though you will still need to become familiar with the WikiText language in order to use more interesting formatting and customization. The WikiText language is similar to Markdown. There is also a plugin that will let you write your tiddlers in Markdown if you are more familiar and comfortable with that. You can add images and scribble all over them, as well as save links to websites with a download and some difficulty. TiddlyWiki includes search functionality and tagging, which is especially useful, as you can click on a tag you get a list of pages that have that tag. There are encryption plugins, which I have not tested, to create password-protected tiddlers and offer some basic security (though neither I nor the creators of TiddlyWiki endorse putting sensitive information on one of these sites).
Setting up where your files save so you can find them again is probably the hardest part of setting up a TiddlyWiki. It creates one HTML file that you update as you save. If you’re using Firefox and using the Firefox plugin I recommend downloading an empty wiki and copying it from your Downloads and pasting it to your G:Drive or another place where files aren’t deleted automatically. After, you can click on the cat icon and set it to automatically save your changes to your file on the Desktop.
Note: Don’t save things to the Desktop on Scholarly Commons computers long-term, as files are routinely erased.
Let us know in the comments if you have any other personal information management systems that need more love!
Are you sitting around thinking to yourself, golly, the bloggers at Commons Knowledge have not tried to convince me to learn Python in a few weeks, what’s going on over there? Well, no worries! We’re back with another post going over the reasons why you should learn Python. And to answer your next question no, the constant Python promotion isn’t us taking orders from some sinister serpentine society. We just really like playing with Python and coding here at the Scholarly Commons.
Why should I learn Python?
Python is a coding language with many applications for data science, bioinformatics, digital humanities, GIS, and even video games! Python is a great way to get started with coding and beef up your resume. It’s also considered one of the easier coding languages to learn and whether or not you are a student in LIS 452, we have resources here for you! And if you need help you can always email the Scholarly Commons with questions!
Where can I get started at Scholarly Commons?
We have a small section of great books aimed at new coders and those working on specific projects here in the space and online through the library catalog. Along with the classic Think Python book, some highlights include:
Python Crash Course is an introductory textbook for Python, which goes over programming concepts and is full of examples and practice exercises. One unique feature of this book is that it also includes three multi-step longer projects: a game, a data visualization, and a web app, which you can follow for further practice. One nice thing is that with these instructions available you have something to base your own long term Python projects on, whether for your research or a course. Don’t forget to check out the updates to the book at at their website.
Automate Boring Stuff with Python is a solid introduction to Python with lots of examples. The target audience is non-programmers who plan to stay non-programmers; the author aims to provide the minimum amount of information necessary so that users can ultimately use Python for useful tasks, such as batch organizing files. It is still a lot of information and I feel some of the visual metaphors are more confusing than helpful. Of course, having a programming background helps, despite the premise of the book.
This book can also be found online for free on this website.
Although focused on Python 2, this is a book about teaching programming skills to newbie coders. Although the author does not specifically use this term this book is based on what is known in psychology as deliberate practice or “the hard way,” which is described in Cal Newport’s blog post “The Grandmaster in the Corner Office” (Newport, 2010). And Learn Python the Hard Way certainly lives up to the title. Even the basic command line instructions prove difficult. But based on my own learning experiences with deliberate practice, if you follow the instructions I imagine you will have a solid understanding of Python, programming, and from what I’ve read in the book definitely some of your more techie friends’ programming jokes.
If the command line makes you scared or if you want to get started right away, definitely check out PythonAnywhere, which offers a basic plan that allows users to create and run Python programs in their browser. If PythonAnywhere isn’t your speed, check out this article, which lists the 45 best places to learn to code online.
Interested in joining an online Python learning group this summer?
Definitely check out, Advent of Python, an online Python co-learning group through The Digital Humanities Slack. It started Tuesday May 30 with introductions, and every week there will be Python puzzles for you to help you develop your skills. IT IS NOT TOO LATE TO JOIN! The first check-in and puzzle solutions will be June 6. The solutions and check-ins are going to be every Tuesday, except the Fourth of July — that meeting will be on Wednesday, July 5. There is a Slack, a Google Doc, and subreddits.
Living in Champaign-Urbana?
Be sure to check out Py-CU a Maker/Hacker group in Urbana welcome to coders with all levels of experience with the next meeting on June 3rd. And obligatory heads up, the Urbana Makerspace is pretty much located in Narnia.
Question for the comments, how did you learn to code? What websites, books and resources do you recommend for the newbie coder?
When one thinks of Pinterest, they tend to associate it with work night crock pot recipes and lifehacks that may or may not always work. But Pinterest can also be a great place to store and share links and information relating to your academic discipline that is widely accessible and free. In this post, we’ll look at how threegroups use Pinterest in different ways to help their mission, then go through some pros and cons of using Pinterest for academic endeavors
A Digital Tool Box for Historians is exactly what it says on the tin. On the date this post was written, A Digital tool Box for Historians boasts 124 pins, each a link to a digital resource that can help historians. Resources range from free-to-use websites to pay-to-use software and everything in-between. It is an easy to follow board that is made for easy browsing.
Europeana is a website dedicated to collecting and sharing cultural artifacts and art from around the world. Their Pinterest page serves as a virtual museum with pins grouped into thematic boards, as if they were galleries. With over a hundred and fifty boards, their subject matter ranges from broad themes (such as their Birds and Symbolism board), artistic medium (such as their Posters board, or specific artistic movements or artists (such as their Henri Verstijnen – Satirical Drawings board). Pinterest users can then subscribe to favorite boards and share pieces that they find moving, thus increasing the dissemination of pieces that could remain static if only kept on the Europeana website.
Sponsored by — you guessed it — Love Your Data Week, the Love Your Data Week Pinterest board serves as a community place to help institutions prepare for Love Your Data Week. Resources shared on the Love Your Data Week board can either be saved to an institution’s own Love Your Data board, or used on their other social media channels to spark discussion.
Whether it’s a gallery, tool kit, or resource aggregation, Pinterest shows potential for growth in academic and research circles. Have you used Pinterest for academics before? How’d it go? Any tips you’d like to give? Let us know in the comments!
The Inter-university Consortium for Political and Social Research (ICPSR) is once again offering its summer workshops for researchers! Workshops range from Rational Choice Theories of Politics and Society to Survival Analysis, Event History Modeling, and Duration Analysis. There are so many fantastic choices across the country that we can hardly decide which we’d want to go to the most!
This is what the ICPSR website describes the workshops as:
Since 1963, the Inter-university Consortium for Political and Social Research (ICPSR) has offered the ICPSR Summer Program in Quantitative Methods of Social Research as a complement to its data services. The ICPSR Summer Program provides rigorous, hands-on training in statistical techniques, research methodologies, and data analysis. ICPSR Summer Program curses emphasize the integration of methodological strategies with the theoretical and practical concerns that arise in research on substantive issues. The Summer Program’s broad curriculum is designed to fulfill the needs of researchers throughout their careers. Participants in each year’s Summer Program generally represent about 30 different disciplines from more than 350 colleges, universities, and organizations around the world. Because of the premier quality of instruction and unparalleled opportunities for networking, the ICPSR Summer Program is internationally recognized as the leader for training in research methodologies and technologies used across the social, behavioral, and medical sciences.
Courses are available in 4-week sessions (June 26 – July 21, 2017 and July 24 – August 18, 2017) as well as shorter workshops lasting 3-to-5 days (beginning May 8). More details about the courses can be found here.
Details about registration deadlines, fees, and other important information can be found here.
If you want some help figuring out which workshops are most appropriate for you or just want to chat about the exciting offerings, come on over to the Scholarly Commons, where our social science experts can give you a hand!