Introducing Megan Ozeran, Data Analytics and Visualization Resident Librarian

Photograph of Megan Ozeran

This latest installment of our series of interviews with Scholarly Commons experts and affiliates features Megan Ozeran, the Data Analytics and Visualization Resident Librarian at the Scholarly Commons.Welcome, Megan!


What is your background and work experience?

I received a BA in Media Studies with an English minor from Pomona College (in sunny southern California). After graduating I couldn’t justify going to grad school for Cultural Studies, as much as the subject area fascinated me. The obvious career path with that degree is to become a professor, which I didn’t want to do. After some time unemployed I started a job as a worker’s compensation claims adjuster, which taught me a lot about our broken healthcare system and was generally dissatisfying. My father, a former surgeon and active in health policy, started a health information technology company so I quit insurance and started working for him.

This job is where I learned about computer programming, user interface design, business intelligence, strategic planning, and attending industry conferences. After a couple of years I decided to go to library school. I enrolled at San Jose State University and started volunteering for a local independent LGBTQ library to gain real-world experience. (Check out California’s wonderful Lavender Library). After a semester I started a part time job at a small community college library and quit the health IT business. The community college library ended up being too small for me to gain as much experience as I hoped, so I took a summer internship at California State University Northridge where I explored three different aspects of digital services: the institutional repository, digitization of special collections, and electronic resources management. After receiving my MLIS this past May, I applied for a dozen jobs and eventually moved 2000 miles to be Illinois’ Data Analytics and Visualization Resident Librarian.

What led you to this field?

When I was struggling to decide on a career path, I stumbled across the library sector and dug deeper. I saw that there were so many different kinds of jobs working in libraries, in large part because of social and technological shifts, and many of these jobs intrigued me. Around the time of my internship I created a personal career mission: to use current and emerging technologies to enhance access to information and resources. It’s all about harnessing the power of technology to empower people.

What is your research agenda?

I’m exploring the ethics of data analysis and data visualization. We have tools to analyze an astonishing amount and variety of data, but how many people critically evaluate their assumptions and decisions when performing these analyses? How many people are taught to consider ethical principles when they are taught software and algorithms? How many people consider ethical principles when they design data visualizations? Algorithms and analytics are increasingly running people’s lives, so we need to ensure that we deploy them ethically.

Do you have any favorite work-related duties?

I’m still very new so I’m constantly learning, which is both challenging and exciting. My favorite part has been connecting with researchers (whether students, faculty or staff) to learn about the great research projects they are doing on campus.

What are some of your favorite underutilized resources that you would recommend to researchers?

I’m not sure how many researchers know that the Scholarly Commons lab is a great place to come and explore your data if you’re not set on a specific analysis process. Our computers have an extensive collection of software that you can use to analyze either quantitative or qualitative data. Importantly, you can come try out software that might otherwise be very expensive.

Also, I am an underutilized resource! I’m still learning, but if you have data analytics or visualization questions, stop by Scholarly Commons or shoot me an email and we can set up a time to chat.

If you could recommend only one book to beginning researchers in your field, what would you recommend?

Doing Data Science by Rachel Schutt and Cathy O’Neil is a great primer to all things data science (we have the ebook version in the catalog). I’m still learning myself, so I’m open to recommendations, too!

Facebook Twitter Delicious Email

(Baseball) Bibliometrics: Calculating the Scoreboard

A small stack of baseballs, a helmet, and a baseball bat resting in the sand near a base.

This post was guest authored by Scholarly Communication and Publishing Graduate Assistant Paige Kuester. This is the second part of a three-part series. Read Part 1 here.


In our last post, we discussed what makes a journal the best team for a scholarly player (sort of). Today, we are looking at scores that are used to directly measure the impact of scholarly articles and the authors themselves.

H-index

Now this score is a bit trickier to calculate. But first, it’s probably best to explain what it is and what it does. H-index focuses on a specific researcher’s own output, in both the form of their most cited papers and also using the number of citations of their work that others have used. Yeah, this is a curve ball.

Now if we were really going to spend an afternoon at the ballpark learning about scholarly measurements, then we would go into the nitty gritty of how to figure out the most cited papers, and also how to actually figure out an h-index. But in simple terms, you need to list the number of publications with the most citations in descending order. Next, you go down the list until the number of citations is no longer greater than or equal to its position in the list. The last citation that is greater than or equal to its position in the list is the h-index. Check out this Waterloo library guide for an example.

Otherwise, you can also just look it up. The scores might vary between websites because of the differences in their content, but Google Scholar, Web of Science, and Scopus all give an h-index.

If none of this made sense, here’s a plug for the Wikipedia page that informed my basic understanding.

There is not a metric in baseball that’s like this. Maybe if our baseball team had a starting line up where the players with the most home runs started and went down the order in descending number of home runs, but cut off when the the lineup reached the last player that had a greater or equal number of home runs as the position that they were in? There is more strategy than that to batting order, so that is clearly not how it works, but you knew coming into this that this was going to be a stretched metaphor, anyway.

So what’s next?

G-index and i10-index

Both of these indices are not as widely used as the h-index.

The g-index is supposed to be an updated version of the h-index that places more value on highly cited articles.

i10 is only used on Google Scholar, and can be remembered by its name: it is the number of articles that an author has that have 10 citations or more each.

Okay, I think we’ve lost focus on the game, but we will come back to it in the next post.

Don’t worry, we’re in the seventh inning stretch. The game is about to get a whole lot more exciting, but I promise we won’t go into extra innings.

Facebook Twitter Delicious Email

Announcing Topic Modeling – Theory & Practice Workshops

An example of text from a topic modeling project.We’re happy to announce that Scholarly Commons intern Matt Pitchford is teaching a series of two Savvy Researcher Workshops on Topic Modeling. You may be following Matt’s posts on Studying Rhetorical Responses to Terrorism on Twitter or Preparing Your Data for Topic Modeling on Commons Knowledge, and now is your chance to learn the basics from the master! The workshops  will be held on Wednesday, December 6th and Friday, December 8th. See below for more details!

Topic Modeling, Part 1: Theory

  • Wednesday, December 6th, 11am-12pm
  • 314 Main Library
  • Topic models are a computational method of identifying and grouping interrelated words in any set of texts. In this workshop we will focus on how topic models work, what kinds of academic questions topic models can help answer, what they allow researchers to see, and what they can obfuscate. This will be a conversation about topic models as a tool and method for digital humanities research. In part 2, we will actually construct some topic models using MALLET.
  • To sign up for the class, see the Savvy Researcher calendar

Topic Modeling, Part 2: Practice

  • Friday, December 8th, 11am-12pm
  • 314 Main Library
  • In this workshop, we will use MALLET, a java based package, to construct and analyze a topic model. Topic models are a computational method of identifying and grouping interrelated words in any set of text. This workshop will focus on how to correctly set up the code, understand the output of the model, and how to refine the code for best results. No experience necessary. You do not need to have attended Part I in order to attend this workshop.
  • To sign up for this class, see the Savvy Researcher calendar
Facebook Twitter Delicious Email

Save the Date: Edward Ayers Talk

Ayers_Edward_photo

We are so excited to be hosting a talk by Edward Ayers this coming March! Save the date on your calendars:

March 29, 2018 | 220 Main Library | 4-6 pm

Edward Ayers has been named National Professor of the Year, received the National Humanities Medal from President Obama at the White House, won the Bancroft Prize and Beveridge Prize in American history, and was a finalist for the National Book Award and the Pulitzer Prize. He has collaborated on major digital history projects including the Valley of the Shadow, American Panorama, and Bunk, and is one of the co-hosts for BackStory, a popular podcast about American history. He is Tucker-Boatwright Professor of the Humanities and president emeritus at the University of Richmond as well as former Dean of Arts and Sciences at the University of Virginia. His most recent book is The Thin Light of Freedom: The Civil War and Emancipation in the Heart of America, published in 2017 by W. W. Norton.

His talk will be on “Twenty-Five Years in Digital History and Counting”.

Edward Ayers began a digital project just before the World Wide emerged and has been pursuing one project or several projects ever since. His current work focuses on the two poles of possibility in the medium: advanced projects in visualizing processes of history at the Digital Scholarship Lab at the University of Richmond and a public-facing project in Bunk, curating representations of the American past for a popular audience.

We hope you’ll be able to join us at his public talk in March!

Facebook Twitter Delicious Email

(Baseball) Bibliometrics Broken Down: A Series

A box of baseballs.This post was guest authored by Scholarly Communication and Publishing Graduate Assistant Paige Kuester. This is the first part of a three-part series.


No matter what game, everyone wants to be the best. Play for the best team, have the highest score, whatever. The game of research is no different. Now, I don’t mean to suggest that research and publishing should not be taken seriously by calling it a game, but there are still high scores involved that may be the deciding factor in the end result, which could be tenure or a higher paycheck or just negotiating power. You have probably heard of some of these scores, like the h-index or altmetrics. Even if you know what they mean, you might not know their significance or how they are calculated. And if you do know all of that, your time might be better spent elsewhere, unless you enjoy a super-stretched sports metaphor.

Yes, to further extend this game metaphor, we’re going to spend an afternoon at the ballpark. I’m visualizing Wrigley, but we can go wherever your favorite team plays, as long as it’s a Major League team. I know that I might be losing you at this point, and I might get lost in this imperfect metaphor myself, but if we make it through, there’s sure to be a win at the end.

In this game, our scholarly authors (professors) are our players (professionals). This could be humorous, but don’t laugh yet, because these scholars are playing a serious game. Even though getting on the starting line up does not guarantee a spot later in the season, I am going to equate that with gaining tenure for professors, as they are both goals that take hard work and dedication to achieve.

Journal Impact Factor

In order to have a good career, being on a highly ranked team is an automatic boost. They’re usually good for a reason, and fans will think that you must be good if you started off on such a prestigious team.

Picking a journal to publish in is a similar process, at least for the sake of this argument. While journals don’t go out and recruit, they are ranked in different ways, just like baseball teams. One way is through journal impact factor, which ranks the journals based on the average number of citations that a typical article has had in the last two years.

The formula works like this: take the number of cited articles from the journal that is in question during a two year period that were indexed during the following. Next, find out how many articles there were that were published and citable during that same two-year time period. Divide the first number by the second number, and you’ve got journal impact. This is formula is actually easier than figuring out the top ranked baseball teams in terms of math, but if you are really up for a challenge, you can try that, too.

If you didn’t get that math, that’s just fine, because there are websites that do it for you. Journal Citation Reports puts out the scores every year, and, as in most sports, the higher the better.

Originally, Impact Factor was not supposed to be used to judge how good an author or an article was was, but this is one way that many judge those authors now. If you can play for a good team, if you can get your article published in a highly ranked journal, you must be good, right?

Well, not everyone thinks that this is a representative way to measure academic impact, so there are other specific measures for the players and their articles, which will be discussed in the next post. Don’t worry, we’re just getting started.

Facebook Twitter Delicious Email

Open Source Tools for Social Media Analysis

Photograph of a person holding an iPhone with various social media icons.

This post was guest authored by Kayla Abner.


Interested in social media analytics, but don’t want to shell out the bucks to get started? There are a few open source tools you can use to dabble in this field, and some even integrate data visualization. Recently, we at the Scholarly Commons tested a few of these tools, and as expected, each one has strengths and weaknesses. For our exploration, we exclusively analyzed Twitter data.

NodeXL

NodeXL’s graph for #halloween (2,000 tweets)

tl;dr: Light system footprint and provides some interesting data visualization options. Useful if you don’t have a pre-existing data set, but the one generated here is fairly small.

NodeXL is essentially a complex Excel template (it’s classified as a Microsoft Office customization), which means it doesn’t take up a lot of space on your hard drive. It does have advantages; it’s easy to use, only requiring a simple search to retrieve tweets for you to analyze. However, its capabilities for large-scale analysis are limited; the user is restricted to retrieving the most recent 2,000 tweets. For example, searching Twitter for #halloween imported 2,000 tweets, every single one from the date of this writing. It is worth mentioning that there is a fancy, paid version that will expand your limit to 18,000, the maximum allowed by Twitter’s API, or 7 to 8 days ago, whichever comes first. Even then, you cannot restrict your data retrieval by date. NodeXL is a tool that would mostly be most successful in pulling recent social media data. In addition, if you want to study something besides Twitter, you will have to pay to get any other type of dataset, i.e., Facebook, Youtube, Flickr.

Strengths: Good for a beginner, differentiates between Mentions/Retweets and original Tweets, provides a dataset, some light data visualization tools, offers Help hints on hover

Weaknesses: 2,000 Tweet limit, free version restricted to Twitter Search Network

TAGS

TAGSExplorer’s data graph (2,902 tweets). It must mean something…

tl;dr: Add-on for Google Sheets, giving it a light system footprint as well. Higher restriction for number of tweets. TAGS has the added benefit of automated data retrieval, so you can track trends over time. Data visualization tool in beta, needs more development.

TAGS is another complex spreadsheet template, this time created for use with Google Sheets. TAGS does not have a paid version with more social media options; it can only be used for Twitter analysis. However, it does not have the same tweet retrieval limit as NodeXL. The only limit is 18,000 or seven days ago, which is dictated by Twitter’s Terms of Service, not the creators of this tool. My same search for #halloween with a limit set at 10,000 retrieved 9,902 tweets within the past seven days.

TAGS also offers a data visualization tool, TAGSExplorer, that is promising but still needs work to realize its potential. As it stands now in beta mode, even a dataset of 2,000 records puts so much strain on the program that it cannot keep up with the user. It can be used with smaller datasets, but still needs work. It does offer a few interesting additional analysis parameters that NodeXL lacked, such as ability to see Top Tweeters and Top Hashtags, which works better than the graph.

These graphs have meaning!

Strengths: More data fields, such as the user’s follower and friend count, location, and language (if available), better advanced search (Boolean capabilities, restrict by date or follower count), automated data retrieval

Weaknesses: data visualization tool needs work

Hydrator

Simple interface for Documenting the Now’s Hydrator

tl;dr: A tool used for “re-hydrating” tweet IDs into full tweets, to comply with Twitter’s Terms of Service. Not used for data analysis; useful for retrieving large datasets. Limited to datasets already available.

Documenting the Now, a group focused on collecting and preserving digital content, created the Hydrator tool to comply with Twitter’s Terms of Service. Download and distribution of full tweets to third parties is not allowed, but distribution of tweet IDs is allowed. The organization manages a Tweet Catalog with files that can be downloaded and run through the Hydrator to view the full Tweet. Researchers are also invited to submit their own dataset of Tweet IDs, but this requires use of other software to download them. This tool does not offer any data visualization, but is useful for studying and sharing large datasets (the file for the 115th US Congress contains 1,430,133 tweets!). Researchers are limited to what has already been collected, but multiple organizations provide publicly downloadable tweet ID datasets, such as Harvard’s Dataverse. Note that the rate of hydration is also limited by Twitter’s API, and the Hydrator tool manages that for you. Some of these datasets contain millions of tweet IDs, and will take days to be transformed into full tweets.

Strengths: Provides full tweets for analysis, straightforward interface

Weaknesses: No data analysis tools

Crimson Hexagon

If you’re looking for more robust analytics tools, Crimson Hexagon is a data analytics platform that specializes in social media. Not limited to Twitter, it can retrieve data from Facebook, Instagram, Youtube, and basically any other online source, like blogs or forums. The company has a partnership with Twitter and pays for greater access to their data, giving the researcher higher download limits and a longer time range than they would receive from either NodeXL or TAGS. One can access tweets starting from Twitter’s inception, but these features cost money! The University of Illinois at Urbana-Champaign is one such entity paying for this platform, so researchers affiliated with our university can request access. One of the Scholarly Commons interns, Matt Pitchford, uses this tool in his research on Twitter response to terrorism.

Whether you’re an experienced text analyst or just want to play around, these open source tools are worth considering for different uses, all without you spending a dime.

If you’d like to know more, researcher Rebekah K. Tromble recently gave a lecture at the Data Scientist Training for Librarians (DST4L) conference regarding how different (paid) platforms influence or bias analyses of social media data. As you start a real project analyzing social media, you’ll want to know how the data you have gathered may be limited to adjust your analysis accordingly.

Facebook Twitter Delicious Email

Preparing Your Data for Topic Modeling

In keeping with my series of blog posts on my research project, this post is about how to prepare your data for input into a topic modeling package. I used Twitter data in my project, which is relatively sparse at only 140 characters per tweet, but the principles can be applied to any document or set of documents that you want to analyze.

Topic Models:

Topic models work by identifying and grouping words that co-occur into “topics.” As David Blei writes, Latent Dirichlet allocation (LDA) topic modeling makes two fundamental assumptions: “(1) There are a fixed number of patterns of word use, groups of terms that tend to occur together in documents. Call them topics. (2) Each document in the corpus exhibits the topics to varying degree. For example, suppose two of the topics are politics and film. LDA will represent a book like James E. Combs and Sara T. Combs’ Film Propaganda and American Politics: An Analysis and Filmography as partly about politics and partly about film.”

Topic models do not have any actual semantic knowledge of the words, and so do not “read” the sentence. Instead, topic models use math. The tokens/words that tend to co-occur are statistically likely to be related to one another. However, that also means that the model is susceptible to “noise,” or falsely identifying patterns of cooccurrence if non-important but highly-repeated terms are used. As with most computational methods, “garbage in, garbage out.”

In order to make sure that the topic model is identifying interesting or important patterns instead of noise, I had to accomplish the following pre-processing or “cleaning” steps.

  • First, I removed the punctuation marks, like “,.;:?!”. Without this step, commas started showing up in all of my results. Since they didn’t add to the meaning of the text, they were not necessary to analyze.
  • Second, I removed the stop-words, like “I,” “and,” and “the,” because those words are so common in any English sentence that they tend to be over-represented in the results. Many of my tweets were emotional responses, so many authors wrote in the first person. This tended to skew my results, although you should be careful about what stop words you remove. Simply removing stop-words without checking them first means that you can accidentally filter out important data.
  • Finally, I removed too common words that were uniquely present in my data. For example, many of my tweets were retweets and therefore contained the word “rt.” I also ended up removing mentions to other authors because highly retweeted texts tended to mean that I was getting Twitter user handles as significant words in my results.

Cleaning the Data:

My original data set was 10 Excel files of 10,000 tweets each. In order to clean and standardize all these data points, as well as combining my file into one single document, I used OpenRefine. OpenRefine is a powerful tool, and it makes it easy to work with all your data at once, even if it is a large number of entries. I uploaded all of my datasets, then performed some quick cleaning available under the “Common Transformations” option under the triangle dropdown at the head of each column: I changed everything to lowercase, unescaped HTML characters (to make sure that I didn’t get errors when trying to run it in Python), and removed extra white spaces between words.

OpenRefine also lets you use regular expressions, which is a kind of search tool for finding specific strings of characters inside other text. This allowed me to remove punctuation, hashtags, and author mentions by running a find and replace command.

  • Remove punctuation: grel:value.replace(/(\p{P}(?<!’)(?<!-))/, “”)
    • Any punctuation character is removed.
  • Remove users: grel:value.replace(/(@\S*)/, “”)
    • Any string that begins with an @ is removed. It ends at the space following the word.
  • Remove hashtags: grel:value.replace(/(#\S*)/,””)
    • Any string that begins with a # is removed. It ends at the space following the word.

Regular expressions, commonly abbreviated as “regex,” can take a little getting used to in order to understand how they work. Fortunately, OpenRefine itself has some solid documentation on the subject, and I also found this cheatsheet valuable as I was trying to get it work. If you want to create your own regex search strings, regex101.com has a tool that lets you test your expression before you actually deploy it in OpenRefine.

After downloading the entire data set as a Comma Separated Value (.csv) file, I then used the Natural Language ToolKit (NLTK) for Python to remove stop-words. The code itself can be found here, but I first saved the content of the tweets as a single text file, and then I told NLTK to go over every line of the document and remove words that are in its common stop word dictionary. The output is then saved in another text file, which is ready to be fed into a topic modeling package, such as MALLET.

At the end of all these cleaning steps, my resulting data is essentially composed of unique nouns and verbs, so, for example, @Phoenix_Rises13’s tweet “rt @drlawyercop since sensible, national gun control is a steep climb, how about we just start with orlando? #guncontrolnow” becomes instead “since sensible national gun control steep climb start orlando.” This means that the topic modeling will be more focused on the particular words present in each tweet, rather than commonalities of the English language.

Now my data is cleaned from any additional noise, and it is ready to be input into a topic modeling program.

Interested in working with topic models? There are two Savvy Researcher topic modeling workshops, on December 6 and December 8, that focus on the theory and practice of using topic models to answer questions in the humanities. I hope to see you there!

Facebook Twitter Delicious Email

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 2

Welcome to part two of our two-part series on creating web maps! If you haven’t read part one yet, you can find it here. If you have read part one, we’re going to pick up right where we left off.

Now that we’ve imported our CSV into a web map, we can begin to play around with how the data is represented. You should be brought to the “Change Style” screen after importing your data, which presents you with a drop-down menu and three drawing styles to choose from:

Map Viewer Change Style Screen

Map Viewer Change Style Screen

Hover over each drawing style for more information, and click each one to see how they visualize your data. Don’t worry if you mess up — you can always return to this screen later. We’re going to use “Types (Unique symbols)” for this exercise because it gives us more options to fiddle with, but feel free to dive into the options for each of the other two drawing styles if you like how they represent your data. Click “select” under “Types (Unique symbols)” to apply the style, then select a few different attributes in the “Choose an attribute to show” dropdown menu to see how they each visualize your data. I’m choosing “Country” as my attribute to show simply because it gives us an even distribution of colors, but for your research data you will want to select this attribute carefully. Next, click “Options” on our drawing style and you can play with the color, shape, name, transparency, and visible range for all of your symbols. Click the three-color bar (pictured below) to change visual settings for all of your symbols at once. When you’re happy with the way your symbols look, click OK and then DONE.

Now is also good time to select your basemap, so click “Basemap” on the toolbar and select one of the options provided — I’m using “Light Gray Canvas” in my examples here.

Change all symbols icon

Click the three-color bar to change visual settings for all of your symbols at once

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Now that our data is visualized the way we want, we can do a lot of interesting things depending on what we want to communicate. As an example, let’s pretend that our IP addresses represent online access points for a survey we conducted on incarceration spending in the United States. We can add some visual insight to our data by inserting a layer from the web using “Add → Search for layers” and overlaying a relevant layer. I searched for “inmate spending” and found a tile layer created by someone at the Esri team that shows the ratio of education spending to incarceration spending per state in the US:

"Search for Layers" screen

The “Search for Layers” screen

 

 

 

 

 

 

 

 

 

 

 

 

 

You might notice in the screenshot above that there are a lot of similar search results; I’m picking the “EducationVersusIncarceration” tile layer (circled) because it loads faster than the feature layer. If you want to learn why this happens, check out Esri’s documentation on hosted feature layers.

We can add this layer to our map by clicking “Add” then “Done Adding Layers,” and voilà, our data is enriched! There are many public layers created by Esri and the ArcGIS Online community that you can search through, and even more GIS data hosted elsewhere on the web. You can use the Scholarly Commons geospatial data page if you want to search for public geographic information to supplement your research.

Now that we’re done visualizing our data, it’s time to export it for presentation. There are a few different ways that we can do this: by sharing/embedding a link, printing to a pdf/image file, or creating a presentation. If we want to create a public link so people can access our map online, click “Share” in the toolbar to generate a link (note: you have to check the “Everyone (public)” box for this link to work). If we want to download our map as a pdf or image, click “Print” and then select whether or not we want to include a legend, and we’ll be brought to a printer-friendly page showing the current extent of our map. Creating an ArcGIS Online Presentation is a third option that allows you to create something akin to a PowerPoint, but I won’t get into the details here. Go to Esri’s Creating Presentations help page for more information.

Click to enlarge the GIFs below and see how to export your map as a link and as an image/pdf:

Share web map via public link

Note: you can also embed your map in a webpage by selecting “Embed In Website” in the Share menu.

 

Saving the map as an image/pdf using the "Print" button in the toolbar. Note: if you save your map as an image using "save image as..." you will only save the map, NOT the legend.

Save your map as an image/pdf. NOTE: if you save your map as an image using “save image as…” you can only save the map, NOT the legend.

While there are a lot more tools that we can play with using our free ArcGIS Online accounts – clustering, pop-ups, bookmarks, labels, drawing styles, distance measuring – and even more tools with an organizational account – 25 different built-in analyses, directions, Living Atlas Layers – this is all that we have time for right now. Keep an eye out for future Commons Knowledge blog posts on GIS, and visit our GIS page for even more resources!

Facebook Twitter Delicious Email

Open Access Button v. Unpaywall: Is there a Winner?

This post was guest authored by Scholarly Communication and Publishing Graduate Assistant Paige Kuester.


A few months back, the Commons Knowledge blog featured a post about a new feature from Impactstory called “Unpaywall.” Read that article here. This is still a relatively new tool that aims to find open access versions of articles if they are available. You can click on the lock that shows up on an article’s page if it is green or gold, and Unpaywall will take you to an OA version of that article. If only a grey lock shows up, then there is no OA version of that article that this feature can find.

Similarly, the Open Access Button’s goal is to get you past paywalls. This is an older extension than Unpaywall, but is still being updated. This one works by bookmarking the button, and once you happen upon a paywalled article, you click on that bookmark. It also has a feature for when the article is not available: emailing the authors directly. The authors are then encouraged to deposit their articles in a repository, and either send a link to that or send the article directly to OAB so that they can upload it to a repository. Of course, if the author’s rights contract does not allow them to do this, then they can decline. OAB is also working with interlibrary loan departments in order to utilize this tool in those systems, which is supposed to eventually reduce the cost of sending articles between libraries.

I decided to test out the Open Access Button in order to write a fantastic blog post about it and how it compares to Unpaywall, and honestly, I came out a bit disappointed.

Maybe I just picked the wrong articles or topic to search for, or I’m just unskilled, but I had little success in my quest.

My first step was to install OAB, which was easy to do: I just dragged the button to my bookmarks for it to chill there until I needed it.

I used Google Scholar to search for an article that I did not have access to through the University. We do have a lot of articles available, but I did manage to pin one down that I could not get the full text for.

The Google Scholar results.

So I went to the page.

And opened my bookmarks to click on the

Open Access Button.

A screenshot of the bookmark for Open Access Button.

And then it loaded. For quite a while.

A screenshot of the loading screen.

And then…

A screenshot of how to request an article.

The article wasn’t available. But it gave me the option to write a note to the author to request it, like I mentioned above. Awesome. I wrote my note, but when I went to send it off, I arrived at another page asking me to supply the author’s email and the DOI of the article.

Screenshot of the website asking for a DOI.

An unexpected twist.

Okay, fine. So I searched and I searched for the first author but to no avail. I did, however, find the second author’s email, so I put that in the box. Check.

Next, the DOI. I searched and I searched and I looked up how to find an article’s DOI. Well, my article was from 1992 so the reason I couldn’t find one was probably because it didn’t have one. There was no option for that, so what next?

I installed Unpaywall to see if I would have more success that way. First, I had to switch from Safari to Chrome because Unpaywall only works on a couple of browsers. It was also easy to install, but I could not get the lock to show up in any color on the page, which is something that has happened to me many times since, also.

I ended up interlibrary loaning that article.

Additional experiences include OAB saying that I had access to an article, but sending me to an institutional repository that only members of that school could access. Unpaywall was more truthful with this one, showing me a grey lock. Another article let me send a message to the author in which they had thankfully found the author’s emails themselves, but I never heard a response back. Unpaywall would not show me any type of lock for this one, not even grey.

Both of these applications are still rather new, and there are still barriers to open access that need to be crossed. I will continue to try and implement these when I come across an article that I don’t have access to because supporting open access is important, but honestly, interlibrary loan was much more helpful to me during this venture.

Facebook Twitter Delicious Email

Creating Quick and Dirty Web Maps to Visualize Your Data – Part 1

Do you have a dataset that you want visualized on a map, but don’t have the time or resources to learn GIS or consult with a GIS Specialist? Don’t worry, because ArcGIS Online allows anybody to create simple web maps for free! In part one of this series you’ll learn how to prepare and import your data into a Web Map, and in part two you’ll learn how to geographically visualize that data in a few different ways. Let’s get started!

The Data

First things first, we need data to work with. Before we can start fiddling around with ArcGIS Online and web maps, we need to ensure that our data can be visualized on a map in the first place. Of course, the best candidates for geographic visualization are datasets that include location data (latitude/longitude, geographic coordinates, addresses, etc.), but in reality, most projects don’t record this information. In order to provide an example of how a dataset that doesn’t include location information can still be mapped, we’re going to work with this sample dataset that I downloaded from FigShare. It contains 1,000 rows of IP addresses, names, and emails. If you already have a dataset that contains location information, you can skip this section and go straight to “The Web Map.”

In order to turn this data into something that’s mappable, we need to read the IP addresses and output their corresponding location information. IP addresses only provide basic city-level information, but that’s not a concern for the sample map that we’ll be creating here. There are loads of free online tools that interpret latitude/longitude data from a list of IP addresses, so you can use any tool that you like – I’m using one called Bulk IP Location Lookup because it allows me to run 500 lines at a time, and I like the descriptiveness of the information it returns. I only converted 600 of the IP addresses in my dataset because the tool is pretty sluggish, and then I used the “Export to CSV” function to create a new spreadsheet. If you’re performing this exercise along with me, you’ll notice that the exported spreadsheet is missing quite a bit of information. I’m assuming that these are either fake IP addresses from our sample dataset, or the bulk lookup tool isn’t working 100% properly. Either way, we now have more than enough data to play around with in a web map.

IP Address Lookup Screencap

Bulk IP Location Lookup Tool

The Web Map

Now that our data contains location information, we’re ready to import it into a web map. In order to do this, we first need to create a free ArcGIS Online account. After you’ve done that, log in and head over to your “Content” page and click “Create → Map” to build a blank web map. You are now brought to the Map Viewer, which is where you’ll be doing most of your work. The Map Viewer is a deceptively powerful tool that lets you perform many of the common functions that you would perform on ArcGIS for Desktop. Despite its name, the Map Viewer does much more than let you view maps.

Map Viewer (No Data)

The Map Viewer

Let’s begin by importing our CSV into the Web Map: select “Add → Add Layer From File.” The pop-up lets you know that you can upload Shapefile, CSV, TXT, or GPX files, and includes some useful information about each format. Note the 1,000 item limit on CSV and TXT files – if you’re trying to upload research data that contains more than 1,000 items, you’ll want to create a Tile Layer instead. After you’ve located your CSV file, click “Import Layer” and you should see the map populate. If you get a “Warning: This file contains invalid characters…” pop-up, that’s due to the missing rows in our sample dataset – these rows are automatically excluded. Now is a good time to note that your location data can come in a variety of formats, not just latitude and longitude data. For a full list of supported formats, read Esri’s help article on CSV, TXT, and GPX files. If you have a spreadsheet that contains any of the location information formats listed in that article, you can place your data on a map!

That’s it for part one! In part two we’re going to visualize our data in a few different ways and export our map for presentation.

Facebook Twitter Delicious Email