DIY Data Science

Data science is a special blend of statistics and programming with a focus on making complex statistical analyses more understandable and usable to users, typically through visualization. In 2012, the Harvard Business Review published the article, “Data Scientist: The Sexiest Job of the 21st Century” (Davenport, 2012), showing society’s perception of data science. While some of the excitement of 2012 has died down, data science continues on, with data scientists earning a median base salary over $100,000 (Noyes, 2016).

Here at the Scholarly Commons, we believe that having a better understanding of statistics means you are less likely to get fooled when they are deployed improperly, and will help you have a better understanding of the inner workings of data visualization and digital humanities software applications and techniques. We might not be able to make you a data scientist (though certainly please let us know if inspired by this post and you enroll in formal coursework) but we can share some resources to let you try before you buy and incorporate methods from this growing field in your own research.

As we have discussed again and again on this blog, whether you want to improve your coding, statistics, or data visualization skills, our collection has some great reads to get you started.

In particular, take a look at:

The Human Face of Big Data created by Rick Smolan and Jennifer Erwitt

  • This is a great coffee table book of data visualizations and a great flip through if you are here in the space. You will learn a little bit more about the world around you and will be inspired with creative ways to communicate your ideas in your next project.

Data Points: Visualization That Means Something by Nathan Yau

  • Nathan Yau is best known for being the man behind Flowing Data, an extensive blog of data visualizations that also offers tutorials on how to create visualizations. In this book he explains the basics of statistics and visualization.

Storytelling with Data by Cole Nussbaumer Knaflic

LibGuides to Get You Started:

And more!

There are also a lot of resources on the web to help you:

The Open Source Data Science Masters

  • This is not an accredited masters program but rather a curated collection of suggested free and low-cost print and online resources for learning the various skills needed to become a data scientist. This list was created and is maintained by Clare Corthell of Luminant Data Science Consulting
  • This list does suggest many MOOCS from universities across the country, some even available for free

Dataquest

  • This is a project-based data science course created by Vik Paruchuri, a former Foreign Service Officer turned data scientist
  • It mostly consists of a beginner Python tutorial, though it is only one of many that are out there
  • Twenty-two quests and portfolio projects are available for free, though the two premium versions offer unlimited quests, more feedback, a Slack community, and opportunities for one-on-one tutoring

David Venturi’s Data Science Masters

  • A DIY data science course, which includes a resource list, and, perhaps most importantly, includes links to reviews of data science online courses with up to date information. If you are interested in taking an online course or participating in a MOOC this is a great place to get started

Mitch Crowe Learn Data Science the Hard Way

  • Another curated list of data science learning resources, this time based on Zed Shaw’s Learn Code the Hard Way series. This list comes from Mitch Crowe, a Canadian data science

So, is data science still sexy? Let us know what you think and what resources you have used to learn data science skills in the comments!

Works Cited:

Davenport, T. H., & Patil, D. J. (2012, October 1). Data Scientist: The Sexiest Job of the 21st Century. Retrieved June 1, 2017, from https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Noyes, K. (2016, January 21). Why “data scientist” is this year’s hottest job. Retrieved June 1, 2017, from http://www.pcworld.com/article/3025502/why-data-scientist-is-this-years-hottest-job.html

Finding Digital Humanities Tools in 2017

Here at the Scholarly Commons we want to make sure our patrons know what options are out there for conducting and presenting their research. The digital humanities are becoming increasingly accepted and expected. In fact, you can even play an online game about creating a digital humanities center at a university. After a year of exploring a variety of digital humanities tools, one theme has emerged throughout: taking advantage of the capabilities of new technology to truly revolutionize scholarly communications is actually a really hard thing to do.  Please don’t lose sight of this.

Finding digital humanities tools can be quite challenging. To start, many of your options will be open source tools that you need a server and IT skills to run ($500+ per machine or a cloud with slightly less or comparable cost on the long term). Even when they aren’t expensive be prepared to find yourself in the command line or having to write code, even when a tool is advertised as beginner-friendly.

Mukurtu Help Page Screen Shot

I think this has been taken down because even they aren’t kidding themselves anymore.

There is also the issue of maintenance. While free and open source projects are where young computer nerds go to make a name for themselves, not every project is going to have the paid staff or organized and dedicated community to keep the project maintained over the years. What’s more, many digital humanities tool-building projects are often initiatives from humanists who don’t know what’s possible or what they are doing, with wildly vacillating amounts of grant money available at any given time. This is exacerbated by rapid technological changes, or the fact that many projects were created without sustainability or digital preservation in mind from the get-go. And finally, for digital humanists, failure is not considered a rite of passage to the extent it is in Silicon Valley, which is part of why sometimes you find projects that no longer work still listed as viable resources.

Finding Digital Humanities Tools Part 1: DiRT and TAPoR

Yes, we have talked about DiRT here on Commons Knowledge. Although the Digital Research Tools directory is an extensive resource full of useful reviews, over time it has increasingly become a graveyard of failed digital humanities projects (and sometimes randomly switches to Spanish). DiRT directory itself  comes from Project Bamboo, “… a  humanities cyber- infrastructure  initiative  funded  by  the  Andrew  W.  Mellon Foundation between 2008 and 2012, in order to enhance arts and humanities research through the development of infrastructure and support for shared technology services” (Dombrowski, 2014).  If you are confused about what that means, it’s okay, a lot of people were too, which led to many problems.

TAPoR 3, Text Analysis Portal for Research is DiRT’s Canadian counterpart, which also contains reviews of a variety of digital humanities tools, despite keeping text analysis in the name. Like DiRT, outdated sources are listed.

Part 2: Data Journalism, digital versions of your favorite disciplines, digital pedagogy, and other related fields.

A lot of data journalism tools crossover with digital humanities; in fact, there are even joint Digital Humanities and Data Journalism conferences! You may have even noticed how The Knight Foundation is to data journalism what the Mellon Foundation is to digital humanities. However, Journalism Tools and the list version on Medium from the Tow-Knight Center for Entrepreneurial Journalism at CUNY Graduate School of Journalism and the Resources page from Data Driven Journalism, an initiative from the European Journalism Centre and partially funded by the Dutch government, are both good places to look for resources. As with DiRT and TAPoR, there are similar issues with staying up-to-date. Also data journalism resources tend to list more proprietary tools.

Also, be sure to check out resources for “digital” + [insert humanities/social science discipline], such as digital archeology and digital history.  And of course, another subset of digital humanities is digital pedagogy, which focuses on using technology to augment educational experiences of both  K-12 and university students. A lot of tools and techniques developed for digital pedagogy can also be used outside the classroom for research and presentation purposes. However, even digital science resources can have a lot of useful tools if you are willing to scroll past an occasional plasmid sharing platform. Just remember to be creative and try to think of other disciplines tackling similar issues to what you are trying to do in their research!

Part 3: There is a lot of out-of-date advice out there.

There are librarians who write overviews of digital humanities tools and don’t bother test to see if they still work or are still updated. I am very aware of how hard things are to use and how quickly things change, and I’m not at all talking about the people who couldn’t keep their websites and curated lists updated. Rather, I’m talking about, how the “Top Tools for Digital Humanities Research” in the January/February 2017  issue of “Computers in Libraries” mentions Sophie, an interactive eBook creator  (Herther, 2017). However, Sophie has not updated since 2011 and the link for the fully open source version goes to “Watch King Kong 2 for Free”.

Screenshot of announcement for 2010 Sophie workshop at Scholarly Commons

Looks like we all missed the Scholarly Commons Sophie workshop by only 7 years.

The fact that no one caught that error either shows either how slowly magazines edit, or that no one else bothered check. If no one seems to have created any projects with the software in the past three years it’s probably best to assume it’s no longer happening; though, the best route is to always check for yourself.

Long term solutions:

Save your work in other formats for long term storage. Take your data management and digital preservation seriously. We have resources that can help you find the best options for saving your research.

If you are serious about digital humanities you should really consider learning to code. We have a lot of resources for teaching yourself these skills here at the Scholarly Commons, as well as a wide range of workshops during the school year. As far as coding languages, HTML/CSS, Javascript, Python are probably the most widely-used tools in the digital humanities, and the most helpful. Depending on how much time you put into this, learning to code can help you troubleshoot and customize your tools, as well as allow you contribute to and help maintain the open source projects that you care about.

Works Cited:

100 tools for investigative journalists. (2016). Retrieved May 18, 2017, from https://medium.com/@Journalism2ls/75-tools-for-investigative-journalists-7df8b151db35

Center for Digital Scholarship Portal Mukurtu CMS.  (2017). Support. Retrieved May 11, 2017 from http://support.mukurtu.org/?b_id=633

DiRT Directory. (2015). Retrieved May 18, 2017 from http://dirtdirectory.org/

Digital tools for researchers. (2012, November 18). Retrieved May 31, 2017, from http://connectedresearchers.com/online-tools-for-researchers/

Dombrowski, Q. (2014). What Ever Happened to Project Bamboo? Literary and Linguistic Computing. https://doi.org/10.1093/llc/fqu026

Herther, N.K. (2017). Top Tools for Digital Humanities Research. Retrieved May 18, 2017, from http://www.infotoday.com/cilmag/jan17/Herther–Top-Tools-for-Digital-Humanities-Research.shtml

Journalism Tools. (2016). Retrieved May 18, 2017 from http://journalismtools.io/

Lord, G., Nieves, A.D., and Simons, J. (2015). dhQuest. http://dhquest.com/

Resources Data Driven Journalism. (2017). Retrieved May 18, 2017, from http://datadrivenjournalism.net/resources
TAPoR 3. (2015). Retrieved May 18, 2017 from http://tapor.ca/home

Visel, D. (2010). Upcoming Sophie Workshops. Retrieved May 18, 2017, from http://sophie2.org/trac/blog/upcomingsophieworkshops

Learn Python Summer 2017

Are you sitting around thinking to yourself, golly, the bloggers at Commons Knowledge have not tried to convince me to learn Python in a few weeks, what’s going on over there? Well, no worries! We’re back with another post going over the reasons why you should learn Python. And to answer your next question no, the constant Python promotion isn’t us taking orders from some sinister serpentine society. We just really like playing with Python and coding here at the Scholarly Commons.

Why should I learn Python?

Python is a coding language with many applications for data science, bioinformatics, digital humanities, GIS, and even video games! Python is a great way to get started with coding and beef up your resume. It’s also considered one of the easier coding languages to learn and whether or not you are a student in LIS 452, we have resources here for you! And if you need help you can always email the Scholarly Commons with questions!

Where can I get started at Scholarly Commons?

We have a small section of great books aimed at new coders and those working on specific projects here in the space and online through the library catalog. Along with the classic Think Python book, some highlights include:

Python Crash Course: A Hands on Project-Based Introduction to Programming

Python Crash Course is an introductory textbook for Python, which goes over programming concepts and is full of examples and practice exercises. One unique feature of this book is that it also includes three multi-step longer projects: a game, a data visualization, and a web app, which you can follow for further practice. One nice thing is that with these instructions available you have something to base your own long term Python projects on, whether for your research or a course. Don’t forget to check out the updates to the book at at their website.

Automate Boring Stuff with Python: Practical Programming for Total Beginners

Automate Boring Stuff with Python is a solid introduction to Python with lots of examples. The target audience is non-programmers who plan to stay non-programmers; the author aims to provide the minimum amount of information necessary so that users can ultimately use Python for useful tasks, such as batch organizing files. It is still a lot of information and I feel some of the visual metaphors are more confusing than helpful. Of course, having a programming background helps, despite the premise of the book.

This book can also be found online for free on this website.

Learn Python the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code

Although focused on Python 2, this is a book about teaching programming skills to newbie coders. Although the author does not specifically use this term this book is based on what is known in psychology as deliberate practice or “the hard way,” which is described in Cal Newport’s blog post “The Grandmaster in the Corner Office” (Newport, 2010).  And Learn Python the Hard Way certainly lives up to the title. Even the basic command line instructions prove difficult. But based on my own learning experiences with deliberate practice, if you follow the instructions I imagine you will have a solid understanding of Python, programming, and from what I’ve read in the book definitely some of your more techie friends’ programming jokes.

Online Resources

If the command line makes you scared or if you want to get started right away, definitely check out PythonAnywhere, which offers a basic plan that allows users to create and run Python programs in their browser. If PythonAnywhere isn’t your speed, check out this article, which lists the 45 best places to learn to code online.

Interested in joining an online Python learning group this summer?

Definitely check out, Advent of Python, an online Python co-learning group through The Digital Humanities Slack. It started Tuesday May 30 with introductions, and every week  there will be Python puzzles for you to help you develop your skills. IT IS NOT TOO LATE TO JOIN! The first check-in and puzzle solutions will be June 6. The solutions and check-ins are going to be every Tuesday, except the Fourth of July — that meeting will be on Wednesday, July 5.  There is a Slack, a Google Doc, and subreddits.

Living in Champaign-Urbana?

Be sure to check out Py-CU a Maker/Hacker group in Urbana welcome to coders with all levels of experience with the next meeting on June 3rd. And obligatory heads up, the Urbana Makerspace is pretty much located in Narnia.

Question for the comments, how did you learn to code? What websites, books and resources do you recommend for the newbie coder? 

Works Cited:

Newport, C. (2010, January 6). The Grandmaster in the Corner Office: What the Study of Chess Experts Teaches Us about Building a Remarkable Life. Retrieved May 30, 2017, from http://calnewport.com/blog/2010/01/06/the-grandmaster-in-the-corner-office-what-the-study-of-chess-experts-teaches-us-about-building-a-remarkable-life/