Got Bad Data? Check Out The Quartz Guide

Photograph of a man working on a computer.

If you’re working with data, chances are, there will be at least a few times where you encounter the “nightmare scenario”. Things go awry — values are missing, your sample is biased, there are inexplicable outliers, or the sample wasn’t as random as you thought. Some issues you can solve, other issues are less clear. But before you tear your hair out — or, before you tear all of your hair out — check out The Quartz guide to bad data. Hosted on GitHub,The Quartz guide lists out possible problems with data, and how to solve them, so that researchers have an idea of what next-steps can be when their data doesn’t work as planned.

With translations into six languages and a CreativeCommons 4.0 license, The Quartz guide divides problems into four categories: issues that your source should solve, issues that you should solve, issues a third-party expert should help you solve, and issues a programmer should help you solve. From there, the guide lists specific issues and explains how they can or cannot be solved.

One of the greatest things about The Quartz guide is the language. Rather than pontificating and making an already frustrating problem more confusing, the guide lays out options in plain terms. While you may not get everything you need for fixing your specific problem, chances are you will at least figure out how you can start moving forward after this setback.

The Quartz guide does not mince words. For example, in the “Data were entered by humans” example, it gives an example of messy data entry then says, “Even with the best tools available, data this messy can’t be saved. They are effectively meaningless… Beware human-entered data.” Even if it’s probably not what a researcher wants to hear, sometimes the hard, cold truth can lead someone to a new step in their research.

So if you’ve hit a block with your data, check out The Quartz guide. It may be the thing that will help you move forward with your data! And if you’re working with data, feel free to contact the Scholarly Commons or Research Data Service with your questions!

Announcing Topic Modeling – Theory & Practice Workshops

An example of text from a topic modeling project.We’re happy to announce that Scholarly Commons intern Matt Pitchford is teaching a series of two Savvy Researcher Workshops on Topic Modeling. You may be following Matt’s posts on Studying Rhetorical Responses to Terrorism on Twitter or Preparing Your Data for Topic Modeling on Commons Knowledge, and now is your chance to learn the basics from the master! The workshops  will be held on Wednesday, December 6th and Friday, December 8th. See below for more details!

Topic Modeling, Part 1: Theory

  • Wednesday, December 6th, 11am-12pm
  • 314 Main Library
  • Topic models are a computational method of identifying and grouping interrelated words in any set of texts. In this workshop we will focus on how topic models work, what kinds of academic questions topic models can help answer, what they allow researchers to see, and what they can obfuscate. This will be a conversation about topic models as a tool and method for digital humanities research. In part 2, we will actually construct some topic models using MALLET.
  • To sign up for the class, see the Savvy Researcher calendar

Topic Modeling, Part 2: Practice

  • Friday, December 8th, 11am-12pm
  • 314 Main Library
  • In this workshop, we will use MALLET, a java based package, to construct and analyze a topic model. Topic models are a computational method of identifying and grouping interrelated words in any set of text. This workshop will focus on how to correctly set up the code, understand the output of the model, and how to refine the code for best results. No experience necessary. You do not need to have attended Part I in order to attend this workshop.
  • To sign up for this class, see the Savvy Researcher calendar

Open Source Tools for Social Media Analysis

Photograph of a person holding an iPhone with various social media icons.

This post was guest authored by Kayla Abner.


Interested in social media analytics, but don’t want to shell out the bucks to get started? There are a few open source tools you can use to dabble in this field, and some even integrate data visualization. Recently, we at the Scholarly Commons tested a few of these tools, and as expected, each one has strengths and weaknesses. For our exploration, we exclusively analyzed Twitter data.

NodeXL

NodeXL’s graph for #halloween (2,000 tweets)

tl;dr: Light system footprint and provides some interesting data visualization options. Useful if you don’t have a pre-existing data set, but the one generated here is fairly small.

NodeXL is essentially a complex Excel template (it’s classified as a Microsoft Office customization), which means it doesn’t take up a lot of space on your hard drive. It does have advantages; it’s easy to use, only requiring a simple search to retrieve tweets for you to analyze. However, its capabilities for large-scale analysis are limited; the user is restricted to retrieving the most recent 2,000 tweets. For example, searching Twitter for #halloween imported 2,000 tweets, every single one from the date of this writing. It is worth mentioning that there is a fancy, paid version that will expand your limit to 18,000, the maximum allowed by Twitter’s API, or 7 to 8 days ago, whichever comes first. Even then, you cannot restrict your data retrieval by date. NodeXL is a tool that would mostly be most successful in pulling recent social media data. In addition, if you want to study something besides Twitter, you will have to pay to get any other type of dataset, i.e., Facebook, Youtube, Flickr.

Strengths: Good for a beginner, differentiates between Mentions/Retweets and original Tweets, provides a dataset, some light data visualization tools, offers Help hints on hover

Weaknesses: 2,000 Tweet limit, free version restricted to Twitter Search Network

TAGS

TAGSExplorer’s data graph (2,902 tweets). It must mean something…

tl;dr: Add-on for Google Sheets, giving it a light system footprint as well. Higher restriction for number of tweets. TAGS has the added benefit of automated data retrieval, so you can track trends over time. Data visualization tool in beta, needs more development.

TAGS is another complex spreadsheet template, this time created for use with Google Sheets. TAGS does not have a paid version with more social media options; it can only be used for Twitter analysis. However, it does not have the same tweet retrieval limit as NodeXL. The only limit is 18,000 or seven days ago, which is dictated by Twitter’s Terms of Service, not the creators of this tool. My same search for #halloween with a limit set at 10,000 retrieved 9,902 tweets within the past seven days.

TAGS also offers a data visualization tool, TAGSExplorer, that is promising but still needs work to realize its potential. As it stands now in beta mode, even a dataset of 2,000 records puts so much strain on the program that it cannot keep up with the user. It can be used with smaller datasets, but still needs work. It does offer a few interesting additional analysis parameters that NodeXL lacked, such as ability to see Top Tweeters and Top Hashtags, which works better than the graph.

These graphs have meaning!

Strengths: More data fields, such as the user’s follower and friend count, location, and language (if available), better advanced search (Boolean capabilities, restrict by date or follower count), automated data retrieval

Weaknesses: data visualization tool needs work

Hydrator

Simple interface for Documenting the Now’s Hydrator

tl;dr: A tool used for “re-hydrating” tweet IDs into full tweets, to comply with Twitter’s Terms of Service. Not used for data analysis; useful for retrieving large datasets. Limited to datasets already available.

Documenting the Now, a group focused on collecting and preserving digital content, created the Hydrator tool to comply with Twitter’s Terms of Service. Download and distribution of full tweets to third parties is not allowed, but distribution of tweet IDs is allowed. The organization manages a Tweet Catalog with files that can be downloaded and run through the Hydrator to view the full Tweet. Researchers are also invited to submit their own dataset of Tweet IDs, but this requires use of other software to download them. This tool does not offer any data visualization, but is useful for studying and sharing large datasets (the file for the 115th US Congress contains 1,430,133 tweets!). Researchers are limited to what has already been collected, but multiple organizations provide publicly downloadable tweet ID datasets, such as Harvard’s Dataverse. Note that the rate of hydration is also limited by Twitter’s API, and the Hydrator tool manages that for you. Some of these datasets contain millions of tweet IDs, and will take days to be transformed into full tweets.

Strengths: Provides full tweets for analysis, straightforward interface

Weaknesses: No data analysis tools

Crimson Hexagon

If you’re looking for more robust analytics tools, Crimson Hexagon is a data analytics platform that specializes in social media. Not limited to Twitter, it can retrieve data from Facebook, Instagram, Youtube, and basically any other online source, like blogs or forums. The company has a partnership with Twitter and pays for greater access to their data, giving the researcher higher download limits and a longer time range than they would receive from either NodeXL or TAGS. One can access tweets starting from Twitter’s inception, but these features cost money! The University of Illinois at Urbana-Champaign is one such entity paying for this platform, so researchers affiliated with our university can request access. One of the Scholarly Commons interns, Matt Pitchford, uses this tool in his research on Twitter response to terrorism.

Whether you’re an experienced text analyst or just want to play around, these open source tools are worth considering for different uses, all without you spending a dime.

If you’d like to know more, researcher Rebekah K. Tromble recently gave a lecture at the Data Scientist Training for Librarians (DST4L) conference regarding how different (paid) platforms influence or bias analyses of social media data. As you start a real project analyzing social media, you’ll want to know how the data you have gathered may be limited to adjust your analysis accordingly.

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 2

Welcome to part two of our two-part series on creating web maps! If you haven’t read part one yet, you can find it here. If you have read part one, we’re going to pick up right where we left off.

Now that we’ve imported our CSV into a web map, we can begin to play around with how the data is represented. You should be brought to the “Change Style” screen after importing your data, which presents you with a drop-down menu and three drawing styles to choose from:

Map Viewer Change Style Screen

Map Viewer Change Style Screen

Hover over each drawing style for more information, and click each one to see how they visualize your data. Don’t worry if you mess up — you can always return to this screen later. We’re going to use “Types (Unique symbols)” for this exercise because it gives us more options to fiddle with, but feel free to dive into the options for each of the other two drawing styles if you like how they represent your data. Click “select” under “Types (Unique symbols)” to apply the style, then select a few different attributes in the “Choose an attribute to show” dropdown menu to see how they each visualize your data. I’m choosing “Country” as my attribute to show simply because it gives us an even distribution of colors, but for your research data you will want to select this attribute carefully. Next, click “Options” on our drawing style and you can play with the color, shape, name, transparency, and visible range for all of your symbols. Click the three-color bar (pictured below) to change visual settings for all of your symbols at once. When you’re happy with the way your symbols look, click OK and then DONE.

Now is also good time to select your basemap, so click “Basemap” on the toolbar and select one of the options provided — I’m using “Light Gray Canvas” in my examples here.

Change all symbols icon

Click the three-color bar to change visual settings for all of your symbols at once

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Now that our data is visualized the way we want, we can do a lot of interesting things depending on what we want to communicate. As an example, let’s pretend that our IP addresses represent online access points for a survey we conducted on incarceration spending in the United States. We can add some visual insight to our data by inserting a layer from the web using “Add → Search for layers” and overlaying a relevant layer. I searched for “inmate spending” and found a tile layer created by someone at the Esri team that shows the ratio of education spending to incarceration spending per state in the US:

"Search for Layers" screen

The “Search for Layers” screen

 

 

 

 

 

 

 

 

 

 

 

 

 

You might notice in the screenshot above that there are a lot of similar search results; I’m picking the “EducationVersusIncarceration” tile layer (circled) because it loads faster than the feature layer. If you want to learn why this happens, check out Esri’s documentation on hosted feature layers.

We can add this layer to our map by clicking “Add” then “Done Adding Layers,” and voilà, our data is enriched! There are many public layers created by Esri and the ArcGIS Online community that you can search through, and even more GIS data hosted elsewhere on the web. You can use the Scholarly Commons geospatial data page if you want to search for public geographic information to supplement your research.

Now that we’re done visualizing our data, it’s time to export it for presentation. There are a few different ways that we can do this: by sharing/embedding a link, printing to a pdf/image file, or creating a presentation. If we want to create a public link so people can access our map online, click “Share” in the toolbar to generate a link (note: you have to check the “Everyone (public)” box for this link to work). If we want to download our map as a pdf or image, click “Print” and then select whether or not we want to include a legend, and we’ll be brought to a printer-friendly page showing the current extent of our map. Creating an ArcGIS Online Presentation is a third option that allows you to create something akin to a PowerPoint, but I won’t get into the details here. Go to Esri’s Creating Presentations help page for more information.

Click to enlarge the GIFs below and see how to export your map as a link and as an image/pdf:

Share web map via public link

Note: you can also embed your map in a webpage by selecting “Embed In Website” in the Share menu.

 

Saving the map as an image/pdf using the "Print" button in the toolbar. Note: if you save your map as an image using "save image as..." you will only save the map, NOT the legend.

Save your map as an image/pdf. NOTE: if you save your map as an image using “save image as…” you can only save the map, NOT the legend.

While there are a lot more tools that we can play with using our free ArcGIS Online accounts – clustering, pop-ups, bookmarks, labels, drawing styles, distance measuring – and even more tools with an organizational account – 25 different built-in analyses, directions, Living Atlas Layers – this is all that we have time for right now. Keep an eye out for future Commons Knowledge blog posts on GIS, and visit our GIS page for even more resources!

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 1

Do you have a dataset that you want visualized on a map, but don’t have the time or resources to learn GIS or consult with a GIS Specialist? Don’t worry, because ArcGIS Online allows anybody to create simple web maps for free! In part one of this series you’ll learn how to prepare and import your data into a Web Map, and in part two you’ll learn how to geographically visualize that data in a few different ways. Let’s get started!

The Data

First things first, we need data to work with. Before we can start fiddling around with ArcGIS Online and web maps, we need to ensure that our data can be visualized on a map in the first place. Of course, the best candidates for geographic visualization are datasets that include location data (latitude/longitude, geographic coordinates, addresses, etc.), but in reality, most projects don’t record this information. In order to provide an example of how a dataset that doesn’t include location information can still be mapped, we’re going to work with this sample dataset that I downloaded from FigShare. It contains 1,000 rows of IP addresses, names, and emails. If you already have a dataset that contains location information, you can skip this section and go straight to “The Web Map.”

In order to turn this data into something that’s mappable, we need to read the IP addresses and output their corresponding location information. IP addresses only provide basic city-level information, but that’s not a concern for the sample map that we’ll be creating here. There are loads of free online tools that interpret latitude/longitude data from a list of IP addresses, so you can use any tool that you like – I’m using one called Bulk IP Location Lookup because it allows me to run 500 lines at a time, and I like the descriptiveness of the information it returns. I only converted 600 of the IP addresses in my dataset because the tool is pretty sluggish, and then I used the “Export to CSV” function to create a new spreadsheet. If you’re performing this exercise along with me, you’ll notice that the exported spreadsheet is missing quite a bit of information. I’m assuming that these are either fake IP addresses from our sample dataset, or the bulk lookup tool isn’t working 100% properly. Either way, we now have more than enough data to play around with in a web map.

IP Address Lookup Screencap

Bulk IP Location Lookup Tool

The Web Map

Now that our data contains location information, we’re ready to import it into a web map. In order to do this, we first need to create a free ArcGIS Online account. After you’ve done that, log in and head over to your “Content” page and click “Create → Map” to build a blank web map. You are now brought to the Map Viewer, which is where you’ll be doing most of your work. The Map Viewer is a deceptively powerful tool that lets you perform many of the common functions that you would perform on ArcGIS for Desktop. Despite its name, the Map Viewer does much more than let you view maps.

Map Viewer (No Data)

The Map Viewer

Let’s begin by importing our CSV into the Web Map: select “Add → Add Layer From File.” The pop-up lets you know that you can upload Shapefile, CSV, TXT, or GPX files, and includes some useful information about each format. Note the 1,000 item limit on CSV and TXT files – if you’re trying to upload research data that contains more than 1,000 items, you’ll want to create a Tile Layer instead. After you’ve located your CSV file, click “Import Layer” and you should see the map populate. If you get a “Warning: This file contains invalid characters…” pop-up, that’s due to the missing rows in our sample dataset – these rows are automatically excluded. Now is a good time to note that your location data can come in a variety of formats, not just latitude and longitude data. For a full list of supported formats, read Esri’s help article on CSV, TXT, and GPX files. If you have a spreadsheet that contains any of the location information formats listed in that article, you can place your data on a map!

That’s it for part one! In part two we’re going to visualize our data in a few different ways and export our map for presentation.

Meet Carissa Phillips, Data Discovery and Business Librarian

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Carissa Phillips, Data Discovery and Business Librarian.


What is your background education and work experience?

I earned a BS in physics and astronomy and an MBA with concentrations in finance and statistics, both from the University of Iowa. After receiving my MBA, I worked for the State of Iowa for 2.5 years in the newly-created position of “performance auditor” within the Office of the Auditor of State. After that, I moved to Chicago and worked for Ernst & Young (now EY). For my first 1.5 years there, I was in a newly-created auditing position within Internal Audit Services; for my last 3.5 years, I was an analyst in the Mergers and Acquisitions Due Diligence group (later renamed Transaction Advisory Services).

I joined GSLIS (now the iSchool) in 2002, and worked as a graduate assistant in the Physics Library (since closed) until I graduated in May 2004. I worked as an academic professional in the Library until 2005, when I was hired as the Business and Finance Information Librarian in the Business and Economics Library (since closed). I earned tenure in 2012, and moved to the Scholarly Commons (SC) that same year. I moved again in 2015, this time to Research and Information Services (RIS), and was the interim unit head for a year. This past January, my title changed to Data Discovery and Business Librarian, and I now split my time between RIS and SC.

What led you to this field?

Events were converging that made me stop and reassess my career and life direction. I started to think about what I loved to do. There was a common thread of research and investigation running through every job I had ever held, and I realized that was the part I enjoyed most. So I started to look around at professions and careers that would let me develop and formalize my skills in that area, and I stumbled across librarianship. Once I learned that the top-ranked program was in my home state, it was an easy decision.

What is your research agenda, if you have one?

When I was working toward tenure, I studied the approaches students took in gathering information during experiential consulting projects and their perceptions of the research process. Now, as part of my transition into my new role of Data Discovery and Business Librarian, I’m exploring research areas that will inform my activities.

Do you have any favorite work-related duties?

I love working with researchers to help them identify resources from which they can acquire or derive the data they need. My favorite situations are the ones that seem impossible, when it’s so unlikely that anyone ever collected that data… and then I find it.

What are some of your favorite underutilized resources that you would recommend to researchers?

It really depends on the context, but I love any opportunity to recommend Hathi Trust for social science data. This is one of the best places to hunt for that “impossible” data I mentioned earlier, the data you can’t believe anyone ever collected.

Meet Helenmary Sheridan, Repository Services Coordinator

Picture of Helenmary Sheridan

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Helenmary Sheridan, the Repository Services Coordinator at the University of Illinois at Urbana-Champaign Library. Helenmary manages the Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), a digital archive of scholarship produced by researchers, students, and staff at Illinois. She also conducts outreach with scholars interested in using Illinois’ other public repository, the Illinois Data Bank.


What is your background and work experience?

I graduated with a Master’s Degree in Library and Information Science from the iSchool at Illinois in 2015. I earned my degree through the LEEP program and worked at Northwestern University as a metadata and digital curation assistant while I was in school, which was a wonderful experience. Before that, I worked in visual resources, primarily with the digital collections at Northwestern and prior to that at the University of Chicago where I did my undergrad. At U Chicago, I majored in art history and took significant coursework in geophysics, which was originally my major.

What led you to this field?

I came into this role primarily from a strong interest in metadata. I was creating metadata for digital objects at Northwestern. I had been working with an art historian, and the role developed into project management, working with software developers to build a repository. So I got into working with software developers, and my interest in metadata led me to being a sort of translator between librarians and developers. This led to my being interested in technical infrastructure, without being a programmer myself. But I do have some programming experience, which allows me to communicate more easily about what I’m doing.

What is your research agenda?

In general I’m interested in service management. I’m presenting at DLF (Digital Library Federation) in a couple of months on what it means to be a service manager in a library, museum, or archive setting when a lot of management systems are built for an IT environment. We often have people coming into service manager roles from something else, and I’m interested in seeing how this gets done practically.

I’m also interested in interfaces and how designers of technical systems conceptualize our users and how, through technology, it’s really easy to abuse users.

Do you have any favorite work-related duties?

I do! I love communicating with people and patrons outside of the university. At many academic libraries, you think of your patrons as being just part of the university. Running IDEALS, I communicate with lots of people all over the world, which is really satisfying. That is, both helping people here, and communicating with all sorts of people to spread Illinois scholarship worldwide.

What are some of your favorite underutilized resources that you would recommend to researchers?

I think that a lot of people don’t look outside of their disciplines, which makes a lot of sense. As a researcher, you develop your most efficient ways to find information. But as a student, it can be really productive to go to sources outside of your own discipline. When I was an art history major as an undergrad, I wrote my thesis on scientific illustration and scientific representation through art. Can you trust an artist who has no scientific knowledge to represent what they see? I was consulting lots of scientific work and lots of technology studies stuff, as well as lots of art image databases.

The way these resources are organized is totally different. It broadened my horizons to see what a wealth of resources is out there. Stuff that isn’t necessarily in the libguide for art history, or science and technology studies.

That’s another satisfying part of my work. A diversity of stuff comes into IDEALS, so when I can’t help a patron directly, I can help them find a related resource that might be useful to them.

If you could recommend one book to beginning researchers in your field, what would you recommend?

Something I was thinking about the other day is Clifford Lynch’s 2003-2004 papers and talks on institutional repositories, about how they are going to help solve the crisis of scholarly communication. He suggested that they would become tools to provide researchers with alternative sources for dissemination of their work, or even a platform for new forms of scholarly communication, and he imagines this future where there’s a robust system of interconnected repositories that can all communicate with one another.

Contrast those with his 2016 updates, in which he addresses a trend of saying that the institutional repository has failed. He thinks it’s true that institutional repositories and the places that run them haven’t fulfilled all of these promises and that it might not be worth an institution’s time to develop a repository. But you can use repositories in different ways, and different ways of using them have emerged. He rejects the claim that IRs have proven to be a failure. So instead of seeing institutional repositories and other repositories as a solution that failed to solve a problem, Lynch’s work helped me think of them as solutions to problems that weren’t foreseen.

For instance, you’ll have family members who are looking up their great aunt’s thesis to have something to remember her by. This problem falls outside the traditional scope of academia, but institutional repositories prove very beneficial for people in these sorts of ways. This helps me think about digital libraries in general. We’re not just trying to solve a problem, but to help people. We should be user focused, rather than problem focused.

Helenmary Sheridan can be reached at hsherid2@illinois.edu.

The Importance of File Names

We’ve all been there. You’ve been searching for a file for an hour, sure that you named it ‘draft 2.docx’ or ‘essay.docx’ or ‘FINAL DRAFT I SWEAR.docx’. There’s an hour until your deadline and the print queue is pretty backed up and you cannot find the file.

Again, we’ve all been there. But we don’t have to be.

Creating a naming convention for your files can save you the hassle of searching through files of ‘essay’s and ‘draft’s. Instead, you’ll be able to find files with ease. While everyone should create a system that works for them, here are a few suggestions to think about before choosing a system to name your files.

Think About How You’ll Search For Your Files

Naming conventions are only useful if they actually help you find what you’re looking for. So, create a naming convention that works for how you think about your files! For example, if you’re working with lab data that you save daily, create a system based on the date so your files will be in chronological order.

Keep It Simple!

If you know that you’re not going to want to type out long file names, then don’t choose long file names. Or, if you know that a format will be more difficult for you in the long run, don’t use it in the short run! There are few things more irritating than having to go through and change things because you’ve created a system that’s too complicated.

Change It Up

This is something that I’ve had trouble with — if your system stop working, don’t be afraid to change it up to make things work for you. If your file names are getting too long, or you’re finding that you have trouble differentiating between dates, save yourself a headache by investing some time in creating another style sooner rather than later. That’s not to say that you should go changing all your file names willy-nilly whenever the mood strikes you, but it’s important that you find a way that you can commit to long term.

Resources

If you’re inspired and want to create a new system for naming your files, here are a few resources that you should check out:

Spotlight: PastPin

The PastPin logo.

Who? What? Where? When? and Why? While these make up a catchy song from Spy Kids, they’re also questions that can get lost when looking at digital images, especially when metadata is missing. PastPin wants to help answer these questions, by tagging the location and time of vintage images on Flickr Commons, with the hope that one day they will be searchable through the Where? and When? of the images themselves. By doing this, PastPin wants to create new ways to do research using public domain images online.

Created by Geopast — a genealogy service — PastPin uses 6,806,043 images from 115 cultural institutions hosted on Flickr. When a user brings up the PastPin website, they’ll be prompted with images that PastPin believes come from your geographic area. When you click on an image, you can then search a map for its specific location and enter in a date, which is then saved. The image then becomes searchable by PastPin users through the entered information. The hope is that all of these images will be identified, so that all users can search through location or date.

Some images are easier to geolocate and date than others. PastPin pulls in metadata and written descriptions from Flickr, so images that are published by an institution — such as the University Laboratory High School, like several images I encountered — may already have this information readily available, making it easy to type that into the map and save it. Other images become more difficult to locate or date because they lack that information, and take more outside knowledge to suss out. PastPin also lacks adequate guidelines for locations, in particular. As many of the images that come from the University of Illinois are from digitized books, are they looking for the location of where the book was printed? Or of the library it resides in? It’s unclear.

PastPin faces what would seem like a Herculean feat. As I’m writing this, only 1.79% of the nearly seven million images have been located so far, and 2.13% have been dated. Today, there have been 18 updates, including two that I made, so the work moves slowly.

Still, PastPin is an awesome example of the power of crowd-sourced projects, and the potential of new thinking to change the way that we do research. The Internet creates so many new opportunities for kinds of research, and the ability to search through public domain images in new ways is just one of them.

Do you know of other websites that are trying to crowd source data? How about websites that are trying to push research into new directions? Let us know in the comments!

CITL Workshops and Statistical Consulting Fall 2017

CITL is back at it again with the statistics, survey, and data consulting services! They have a busy fall 2017, with a full schedule of workshops on the way, as well as their daily consulting hours in the Scholarly Commons.

Their workshops are as follows:

  • 9/19: R I: Getting Started with R
  • 10/17: R I: Getting Started with R
  • 9/26: R II: Inferential Statistics
  • 10/24: R II: Inferential Statistics
  • 10/3: SAS I: Getting Started with SAS
  • 10/10: SAS II: Inferential Statistics with SAS
  • 10/4: STATA I: Getting Started with Stata
  • 9/20: SPSS I: Getting Started with SPSS
  • 9/27: SPSS II: Inferential Statistics with SPSS
  • 10/11: ATLAS.ti I: Qualitative Data analysis
  • 10/12: ATLAS.ti II: Data Exploration and Analysis

Workshops are free, but participants must register beforehand. For more information about each workshop, and to register, head to the CITL Workshop Details and Resources page.

And remember that CITL is at the Scholarly Commons Monday – Friday, 10 AM – 4 PM.You can always request a consultation, or walk-in.