Love and Big Data

Can big data help you find true love?

It’s Love Your Data Week, but did you know people have been using Big Data for to optimize their ability to find their soul mate with the power of data science! Wired Magazine profiled mathematician and data scientist Chris McKinlay in “How to Hack OkCupid“.There’s even a book spin-off from this! “Optimal Cupid”, which unfortunately is not at any nearby libraries.

But really, we know you’re all wondering, where can I learn the data science techniques needed to find “The One”, especially if I’m not a math genius?

ETHICS NOTE: WE DO NOT ENDORSE OR RECOMMEND TRYING TO CREATE SPYWARE, ESPECIALLY NOT ON COMPUTERS IN THE SPACE. WE ALSO DON’T GUARANTEE USING BIG DATA WILL HELP YOU FIND LOVE.

What did Chris McKinlay do?

Methods used:

  • Automating tasks, such as writing a python script to answer questions on OKCupid
  • Scraping data from dating websites
  • Surveying
  • Statistical analysis
  • Machine learning to figure out how to rank the importance of answers of questions
  • Bots to visit people’s pages
  • Actually talking to people in the real world!

Things we can help you with at Scholarly Commons:

Selected workshops and resources, come by the space to find more!

Whether you reach out to us by email, phone, or in-person our experts are ready to help with all of your questions and helping you make the most of your data! You might not find “The One” with our software tools, but we can definitely help you have a better relationship with your data!

Love Your Data Week 2017

The Scholarly Commons is excited to announce our participation in Love Your Data Week 2017. Taking place from February 13-17th, Love Your Data is an annual event that aims to “build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services.” The 2017 theme is data quality.

Love Your Data Week takes place online, and you’ll find us posting content both on this blog (look out for our post on February 16th) and at our Twitter, @ScholCommons. We’ll be posting new content for each day of Love Your Data Week, so stay tuned! You can follow the wider conversation by looking at the hashtags #LYD17 and #loveyourdata on Twitter and elsewhere. You can also check out the University of Illinois Research Data Service’s Twitter @ILresearchdata for their Love Your Data Week content!

Each day of Love Your Data Week has a different theme. This year the themes are as follows:

  • Monday: Defining Data Quality
  • Tuesday: Documenting, Describing, Defining
  • Wednesday: Good Data Examples
  • Thursday: Finding the Right Data
  • Friday: Rescuing Unloved Data

Got something to say about data? Or just want to be a part of the action? Tweet @scholcommons or comment on this article!

Getting Started With Paperpile

Did the Paperpile Review leave you interested in learning more?

To use Paperpile you need an Internet connection, Google Chrome, and a Google account. Since student/personal use accounts do not require a dot edu email, I recommend using your Google Apps @ Illinois account  for this because you can fully use and enjoy unlimited free storage from Google to store your PDFs. Paperpile offers one month free; afterwards, it’s $36 for the year. You can download the extension for Chrome here. If you already use Mendeley or Zotero you can import all of your files and information from these programs to Paperpile. In order to use Paperpile, you will need the app on each version of Chrome you use. It should sync as part of your Chrome extensions, and you can install it on Chrome on University Library computers as well.

You can import PDFs and metadata by clicking on the Paperpile logo on Chrome.

Paperpile import tool located just right of the search bar in Chrome

On your main page you can create folders, tag items, and more! You can also search for new articles in the app itself.

Paperpile Main Menu

If you didn’t import enough information about a source or it didn’t import the correct information you can easily add more details by clicking the check mark next to the document in the menu and clicking edit on the top menu next to the search box for your papers.

Paperpile

Plus, from the main page, when you click “View PDF” you can also use the beta annotations feature by clicking the pen icon. This feature lets you highlight and comment on your PDF and it saves the highlighted text and comments in order by page in notes. It can then be exported as plain text or as very pretty printouts. It is rectangle-based highlighting and can be a little bit annoying, especially when highlighting doesn’t always covered the text that was copied. Like a highlighter in real life you cannot continue to highlight onto the next page.

Highlighted and copied sentence split by page boundary

When you leave the app, the highlighting is saved on the PDF in your Google Drive and you can your highlights on the PDF wherever you use Google Drive. The copied text and comments can be exported into a very pretty printout or a variety of plaintext file formats.

Print screen of exported annotated notes on Paperpile

Not the prettiest example but you get the idea.

Once you get to actually writing your paper you can add citations to your paper in Google docs by clicking the Paperpile tab on your Google doc. You can search your library or the web for a specific article. Click format citations and follow the instructions for how to download the add-on for Google docs.

Paperpile cite while you write in Google Docs

I didn’t try it but there’s a Google Docs sidebar so that anyone can add references, regardless of whether or not they are a Paperpile user, to a Google Doc. I imagine this is great for those group projects where the “group” is not just the person who cares the most.


Troubleshooting

Paperpile includes a support chat box, which is located on your main page, and is very useful for troubleshooting. For example, one problem I ran into with Paperpile is that you cannot change the page number to match what it actually is in the article and page number is based on the PDF file in the notes feature. I messaged and  I got a response with a professional tone within twenty-four hours. Turns out, they are working on this problem and eventually PDFs will be numbered by actual page number, but they can’t say when they will have it fixed.

For other problems, there is an official help page  with a lot of instructions about using the software and answers to frequently asked questions. There is also a blog and a  forum which is particularly nice because you can see if other people are experiencing the same problem and what the company plans to do about it.

Scholarly Commons runs a variety of Savvy Researcher workshops throughout the year including personal information management and citation managers. And let us know in the comments about your favorite citation/reference management software and your way of keeping your research organized!

And for the curious, the examples in this post are based from the undergraduate research collection in IDEALS. Specifically:

Kountz, Erik. 2013. “Cascades of Cacophony.” Equinox Literary and Arts Magazine. http://hdl.handle.net/2142/89474.

Liao, Ethel. 2013. “Nutella, Dear Nutella.” Equinox Literary and Arts Magazine. http://hdl.handle.net/2142/89476.

Montesinos, Gary. 2015. “The Invisible (S)elf: Identity in House Elves and Harry Potter.” Re:Search: The Undergraduate Literary Criticism Journal 2 (1). http://hdl.handle.net/2142/78004.

Spotlight: Postach.io Blogging Platform

Many people use Evernote to keep their research (and life) organized. This notebook-based note-taking platform has grown in popularity so much, that the creators of Evernote created Postach.io, a blogging platform that connects with Evernote, and uses Evernote notes as the content of blog posts. Basically, you can take the notes you’ve created in Evernote and directly publish them for anyone to see!

If you’re someone who is already familiar with, and using Evernote, Postach.io may be a great, free platform for you to get your research out there. While it doesn’t have the same kind of customization options that you can have on WordPress or Tumblr, nor the built-in audiences of those sites, its simplified style and integration with Evernote makes it a useful tool, especially since Postach.io is free, and only requires that you have/create an Evernote account.

To start, you must link up your Evernote account with Postach.io. After submitting your contact information, the site will automatically transfer you to Evernote.

p2

The first step to creating a Postach.io site is to give your name, email address, and password.

p3

The Evernote page that Postach.io links you to.

Evernote will then ask whether you’d like to create a new notebook for your Postach.io site, or link to a notebook already in use. Note that linking to an already-created notebook does not automatically make your notes public. Each note on the site must have a ‘published’ tag attached to it to in order to be public. I’ll have more on that in a little bit.

You can also choose the length of time Postach.io will have access to your notebook. Lengths range from a minimum of one day to a maximum of one year. After that period, Postach.io will either lose access to that notebook, or you will have to reauthorize it.

After you authorize your account, you will have the opportunity to create an Evernote note that will serve as your initial Postach.io post. The most important part of this process is tagging the post as “Published.” A note that lacks this tag will not be put on your Postach.io site, even if it’s in your authorized notebook.

p4

Me adding my “published” tag to ensure that my post is added to my Postach.io site.

Once you finish and tag your post, your Postach.io account is officially up and running.

As far as the site itself, your options are somewhat limited. This is what your site will look like immediately after you publish your first post:

A very generic theme.

A very generic theme.

You do have the option to change your avatar and background image, as well as choose from a little over a dozen themes to work with. These themes, however, are all incredibly basic, with few customization options outside of the basic appearance. In order to access the source code for your site or to create a custom theme, you will need to upgrade your account to a paid account.

A paid account will let you access that source code, as stated above, as well as create multiple sites. With a free account, you can only have one site at a time. $5/month gets you five sites, $15/month will get you twenty sites, and $25/month will give you fifty. If you pay for an entire year in advance, you’ll get two months out of the year free. In my opinion, you’re better off using a free platform like Tumblr or WordPress and transferring your Evernote data than opting for a paid account.

Overall, Postach.io is a simple way to get work that you’ve already started in Evernote published and readable by the world.

Do you think you’ll use Postach.io? What blogging platforms do you use? Let us know in the comments, or Tweet us at @scholcommons!

Review: Paperpile Citation Manager

Are you addicted to Google Docs and are looking for a citation manager, PDF reader, or research workflow system? Do you wish you could just cite while you write in Google docs like you do with Zotero or Mendeley in Word? Do you have an extra $36 a year to spare?

Then you might want to try Paperpile!

Paperpile App Main Menu

Paperpile is a simplified reference management system and research workflow program for Google Chrome created by three computational biologists based in Vienna.

Pros:

  • Easy to use
  • Can organize your sources when you’re trying to write a paper or doing readings
  • A lot of explanatory text in the app
  • Allows you to import metadata and PDFs from your browser (similar to Zotero’s one click import) and asks you if you want to add the item (PDF and details) to Paperpile
  • The annotations feature makes readings and notes for classes a lot of fun with very pretty colors
  • When the PDF is not encrypted, if you highlight the text it will copy the highlighted text into notes with your annotations that you can then copy and paste when writing a paper
  • Wide range of document types and citation styles
  • You can cite while you write in Google Docs
  • Provides look up to find similar journal articles to what you are researching, which allows you to do research through the app, especially if you’re doing research from science databases
  • Keyboard shortcuts
  • 15 GB of free space through Google
  • Good customer service
  • Thorough explanatory material
Highlighted text with annotations in the Paperpile app

Excerpt from Montesinos, Gary. 2015. “The Invisible (S)elf: Identity in House Elves and Harry Potter.” Re:Search: The Undergraduate Literary Criticism Journal 2 (1). https://www.ideals.illinois.edu/handle/2142/78004.
And check out Re:Search: The Undergraduate Literary Criticism Journal and more great undergraduate research in IDEALS!

Cons:

  • High cost ($36), especially compared to solid free options like Mendeley and Zotero
  • Requires Internet access
  • Although the company is in the process of developing a plugin for MS Word, currently, Paperpile is heavily reliant on Google and Google Drive
  • Paperpile is a proprietary software and a startup so there are risks that they will go out of business or be bought by a larger company
    • Though, should the worst happen Paperpile uses open standards that will allow you to get your PDFs, citations out — even if they are in an ugly format — as well as the highlighted text saved in your PDFs, which can be downloaded through Google Drive
  • Paperpile is a very new product and there are still a lot of features to be worked out
    • I will say however that it is a lot less buggy than a lot of comparable reference management / PDF annotation software that have been around longer and aren’t classified as in beta, like Readcube and Highlights

Paperpile is comparable to: Mendeley, iLibrarian, colwiz, Highlights.

Learn more about personal information management through our PIM Libguide, various Savvy Researcher workshops and more! Let us know about your strategies for keeping everything organized in the comments!

 

Meet Elizabeth Wickes, Data Curation Specialist

colossusselfie

Elizabeth with a rebuilt and functional Colossus computer at the British National Museum of Computing.

This post is the third in our series profiling the expertise housed in the Scholarly Commons and our affiliate units in the University Library. Today we are featuring Elizabeth Wickes, Data Curation Specialist.


What is your background education and work experience?

I started in psychology and then moved to sociology. I also have a secretarial certificate and I use that training a lot! I worked at Wolfram Research as a Project Manager and then Curation Manager before I started library school.

What led you to this field?

Data curation just finds you. It’s a path where people with certain interests find themselves in.

What is your research agenda?

I’m exploring new and innovative ways to teach data management skills, especially computational research skills that normalize and practice defensive data management skills.

Do you have any favorite work-related duties?

My favorite thing to do is leading workshops and teaching. I really love listening to people’s research and helping them do it better. It’s great hearing about lots of different fields of research. It’s really important to me that I’m not stuck in a single college or field, that we’re a resource for the whole university.

What are some of your favorite underutilized resources that you would recommend to researchers?

I think consultation services in library are underutilized, including consultation for personalized data management.

If you could recommend only one book to beginning researchers in your field, what would you recommend?

Where Wizards Stay Up Late by Katie Hafner and Matthew Lyon. It’s a book all librarians should read, and it would be great for undergraduate reading, too. It’s the history of how the internet was born, explained through biographies of the key players. The book also covers the social and political situation at the time which was really interesting. It’s fascinating that this part of the world (the internet, data curation, etc.) was developed by people who were in college before this was a major or a field of study.

There are a lot of statistics out there about how much data we are producing now: For example: “Data production will be 44 times greater in 2020 than it was in 2009” and “More data has been created in the past two years than in the entire previous history of the human race”… How do you feel about the increase in big data?

Excited. When people ask me “What is big data?” I tell them that there’s a technological answer and a philosophical answer. The philosophical answer is that we no longer have to have a sampling strategy because we can get it all. We can just look at everything. From a data curation and organizational perspective it’s terrifying because there’s so much of it, but exciting.


To learn more about Research Data Service, you can visit their website. Elizabeth also holds Data Help Desk Drop-In Hours in the Scholarly Commons, every Tuesday from about 3:15-5 pm. To get in touch with Elizabeth, you can reach her by email.

Running low on Zotero storage? Sync your files through a cloud storage service

I’ve recently returned to using Zotero for collecting, organizing, and citing references after not having used the software for a couple of years. While I was a bit rusty, it only took a couple of days for me to get up and running at my previous level of Zotero expertise (which really wasn’t that high to begin with). But despite feeling comfortable with the program, it wasn’t long before I found myself running out of storage space.

Zotero’s sync feature allows you to keep your citation data up to date across as many devices as you’d like. And while this is a great feature, I’ve found that it isn’t of much use without also being able to access my PDFs on all these devices as well.

The good news is that Zotero allows you to attach PDFs to items (i.e. citations) in your library. The bad news is that it only gives you 300 MB of free storage (with an option to pay for more). While PDF files generally aren’t that big, 300 MB can get eaten up pretty quickly if you have a lot of documents.

In the past I generally didn’t store my PDFs within Zotero, but I quickly fell in love with this feature upon my recent return to the software. And since I’ve yet to be willing to pay for cloud storage, I was afraid I’d have to resign myself to storing PDF files in one of the many free cloud storage services I use, rather than having them attached to my Zotero data. But, I thought, wouldn’t it be great if there was a way to both store my PDFs via a third party cloud storage service, and have these PDFs linked up to Zotero? Well it turns out there is!

In order to accomplish this feat, you’ll use something called WebDAV (Web Distributed Authoring and Versioning). While I still don’t completely understand what this is, for our purposes a WebDAV service is the third party cloud storage service that you can use to store your Zotero PDFs and other attached files. Zotero provides a list of services that offer free plans and that are known to work with Zotero (I use Box).

Once you’ve decided on a WebDAV service, setting up Zotero to work with it is fairly simple. First open your preferences by clicking the icon that looks like a gear.

zotero2

In the File Syncing section of the preferences menu, select WebDAV in the dropdown menu next to “Sync attachment files in My Library using.”

zotero4

Next, enter the URL for the WebDAV service that you’ve decided to use, along with your user name and password associated with that service.

zotero5

If you’ve chosen one of the services on Zotero’s list, you can find the URL there. Note that the menu pictured above already includes the “https”, “://” and “/zoter/”, so make sure you don’t enter this into the field as well. After entering your information, click on “Verify Server” underneath the password field. If everything has worked correctly, you should get a message that says file sync has been setup!

You can continue attaching PDFs and other files to items in your Zotero library as before. The only difference is that now these files will be stored through your WebDAV rather than through Zotero’s own storage system.

For more information you can consult Zotero’s syncing documentation. If you would like more general information about Zotero, you can consult the Library’s Zotero Libguide or attend a Savvy Researcher Workshop. And as always, send us an email if you have any questions.

Have your own tip for getting the most out of Zotero? Let us know in the comments below!

Note that WebDAV only works with personal, not group, libraries.

Utilizing EverNote to Keep Your Research Organized

Sick of juggling Word documents and notebooks? Trying to find a way to keep your research organized? EverNote may be the tool you need!

EverNote is a popular program that can be accessed from the web, but also downloaded as software on your computer, or as an app on your mobile device or tablet. It is, at its core, built for note-taking and storing information. The free plan allows up to 60 MB of uploads per month (which is typically more than enough for most people), or you can buy their “Plus” package for $34.99/year, or “Premium” for $69.99/year, which give increased storage options, as well as special features.

Academically, EverNote is a great tool if you’re taking lots of notes on various sources. You can store groups of notes in “notebooks,” tag notes with key ideas, as well as upload photos or documents from elsewhere. EverNote syncs up between devices, which can be helpful if you don’t want to lug your laptop from place to place and want to use your tablet to take notes instead.

Now, I’ll walk you through the EverNote interface, and explain how I used EverNote to organize research I did on nineteenth-century cookbooks and food at the Massachusetts Historical Society last summer.

When you log into EverNote, you’ll be taken to a page that includes all of the Notes you’ve taken.

Here's my homepage.

Here’s my homepage.

Now, if you’re working on multiple projects, dealing with all of these at once can be kind of complicated. Thankfully, you have two ways to dwindle down what you’re looking for. The first is to go to your Notebooks. When you’re doing research in EverNote, it’s helpful to organize like-notes into a Notebook, so that they’re grouped together. So for my research project, I grouped my notes into a Notebook called “Boston.”

Tutorial 2 Edit

From there, I have a list of each individual Note that I took while at the MHS. You can sort the way the list appears – I just happen to have them sorted by the Date Updated. From there you can scroll around and find what you’re looking for. But if you want to narrow down your results even more, you can use the search tool to look for keywords, either in a specific notebook or in all of your notes, or you can look for tags that you add to your notes. When you press the Tags button, a list of all the tags you’ve used for your Notes pops up. In this case, I want to look at everything I tagged with “Desserts.”

Tutorial 3 Edit

Tags are only useful if you implement them in the first place, so remember to tag your research as you go along!

A list of the Notes I took that I tagged with "Desserts."

A list of the Notes I took that I tagged with “Desserts.”

As you can see, that narrowed my results down to six results, as opposed to the forty-seven notes I had in my Boston Notebook.

Now, academic notetaking is just one way to use EverNote. EverNote prides itself on having many uses – from being a place of collaboration for offices, to keeping your various to-do lists in one place. It’s up to the user to decide how they would like to use EverNote.

Now, it’s not a perfect program. If a user wants to use some of the fancier aspects of the program, some of the controls are confusing and difficult to figure out at first. Further, I have had issues in the past with the app running slow on my tablet, or crashing in the middle of a note-taking session. (Of course, the notes save automatically and frequently, but it’s frustrating when you’re ten minutes from an archive closing and you’re trying to boot your app up again.) My biggest issue with Evernote, however, is the image-taking system.

At its core, the image-taking system is not a bad idea. However, by trying to make certain images text-searchable, it can ruin the integrity of the image itself. For example, I tried to capture an image of some of the handwritten notes in the Massachusetts Historical Society’s copy of The Young Housekeeper’s Friend, and the Evernote system bleached the pages out, and made the marginalia difficult to read.

Mary Hooker Cornelius, The Young Housekeeper's Friend: or, a Guide to Domestic Economy and Comfort, 1850. Collection of the Massachusetts Historical Society.

Mary Hooker Cornelius, The Young Housekeeper’s Friend: or, a Guide to Domestic Economy and Comfort, 1850. Collection of the Massachusetts Historical Society.

All-in-all, EverNote can be a useful tool for a researcher on-the-go who is trying to stay organized while syncing along various platforms, as well as serving as an organizational tool for every day life!

Data Cite – Find, Identify, and Cite Datasets

Data Cite a non-profit organization created to establish easier access to research data,  increase acceptance of research data as legitimate, citable contributions to the scholarly record, and support data archiving.  This organization seeks to bring institutions, researchers and other interested groups together to address the challenges of making research data accessible and visible.  Through collaboration, researchers find support in locating, identifying, and citing research datasets with confidence.

Data Centers are provided persistent identifiers for datasets, plus workflows and standards for data publication. Journal publishers receive support to enable research articles to be linked with data.  Data Cite works with organizations, data centers, and libraries that host data in efforts to assign persistent identifiers to data sets.

Data citation is important for data re-use, verification and tracking.  Citable datasets become legitimate contributions to scholarly communication, paving the way for new metrics and publication models that recognize and reward data sharing. More information on  DataCite services, resources and events can be found  https://www.datacite.org/.

 

Want Your Data to Last? Pay Attention to File Formats

When managing your own digital research data, do not take the file formats you use for granted. A file format is a standardized way to structure the data stored in a computer file, and is most easily recognizable by the dot and “extension,” typically of two to four letters, at the end of its name (for example, birthdayParty.jpg indicates that this is a JPEG image file). The long-term usability of your data often hinges on it being stored using a well-chosen archival file format. As mentioned on the UIUC Scholarly Commons‘ Data Management webpage on File Formats and Organization:

“Our ability to preserve digital objects is dependent, among other things, on whether the file format used:

  • Is openly documented (more preservable) or proprietary (less preservable);
  • Is supported by a range of software platforms (more preservable) or by only one (less preservable);
  • Is widely adopted (more preservable) or has low use (less preservable);
  • Is lossless data compression (more preservable) or lossy data compression (less preservable); and
  • Contains embedded files or embedded programs/scripts, like macros (less preservable).”

data_management_format_matrix

Confidence in particular file formats may differ between domains. If interested in learning more, seek out best practices documentation in your field, or request a consultation in the Scholarly Commons.

This post was originally published on the Research Data blog by rimkus@illinois.edu.