As blogs continue to provide a low barrier to entry for authors to distribute content in all avenues from academia to entertainment, it is important to make sure that blog posts are just as easy to access for readers. Here at Illinois, our blogs are run through publish.illinois.edu, a WordPress-based publishing service. As we try to improve our services for all, especially our remotely available services, I wanted to use this week’s Commons Knowledge post to discuss improving accessibility in WordPress. Within the platform, making more accessible blog posts isn’t difficult nor does it require much time; however, building these practices into our workflow allows for posts to be accessible—not just for some, but for all.
Hello from home to all my fellow (new) work-from-homers!
In light of measures taken to protect public health, it can feel as though our work schedules have been shaken up. However, we are here to help you get back on track and the first thing to do is make sure you have all the tools necessary to be successful at home.
What is OCR? OCR stands for Optical Character Recognition. This is the electronic identification and digital encoding of typed or printed text by means of an optical scanner or a specialized software. Performing OCR allows computers to read static images of text to convert them to readable, editable, and searchable data on a page. There are many applications of OCR including the creation of more accessible documents for the blind and visually-impaired, text/data mining projects, textual comparisons, and large-scale digitization projects.
There are a different software options to consider when you are performing OCR on you documents and it can be challenging to understand which one is best for you. So let’s break it down. Continue reading
As you do research with larger amounts of data, it becomes necessary to graduate from doing your data analysis in Excel and find a more powerful software. It can seem like a really daunting task, especially if you have never attempted to analyze big data before. There are a number of data analysis software systems out there, but it is not always clear which one will work best for your research. The nature of your research data, your technological expertise, and your own personal preferences are all going to play a role in which software will work best for you. In this post I will explain the pros and cons of Stata, R, and SPSS with regards to quantitative data analysis and provide links to additional resources. Every data analysis software I talk about in this post is available for University of Illinois students, faculty, and staff through the Scholarly Commons computers and you can schedule a consultation with CITL if you have specific questions.
Among researchers, Stata is often credited as the most user-friendly data analysis software. Stata is popular in the social sciences, particularly economics and political science. It is a complete, integrated statistical software package, meaning it can accomplish pretty much any statistical task you need it to, including visualizations. It has both a point-and-click user interface and a command line function with easy-to-learn command syntax. Furthermore, it has a system for version-control in place, so you can save syntax from certain jobs into a “do-file” to refer to later. Stata is not free to have on your personal computer. Unlike an open-source program, you cannot program your own functions into Stata, so you are limited to the functions it already supports. Finally, its functions are limited to numeric or categorical data, it cannot analyze spatial data and certain other types.
|User friendly and easy to learn||An individual license can cost
between $125 and $425 annually
|Version control||Limited to certain types of data|
|Many free online resources for learning||You cannot program new
functions into Stata
- STATA YouTube Channel: A great resource for troubleshooting problems in Stata.
- A Gentle Introduction to STATA by Alan C. Acock: A great reference for getting started with Stata available through the Scholarly Commons collection.
- Stata.com Resources for learning STATA: Lot of information on how to execute specific functions in Stata.
- The University Library’s Guide on STATA: A great place to find links to additional resources on Stata.
R and its graphical user interface companion R Studio are incredibly popular software for a number of reasons. The first and probably most important is that it is a free open-source software that is compatible with any operating system. As such, there is a strong and loyal community of users who share their work and advice online. It has the same features as Stata such as a point-and-click user interface, a command line, savable files, and strong data analysis and visualization capabilities. It also has some capabilities Stata does not because users with more technical expertise can program new functions with R to use it for different types of data and projects. The problem a lot of people run into with R is that it is not easy to learn. The programming language it operates on is not intuitive and it is prone to errors. Despite this steep learning curve, there is an abundance of free online resources for learning R.
|Free open-source software||Steep learning curve|
|Strong online user community||Can be slow|
|Programmable with more functions
for data analysis
- Introduction to R Library Guide: Find valuable overviews and tutorials on this guide published by the University of Illinois Library.
- Quick-R by DataCamp: This website offers tutorials and examples of syntax for a whole host of data analysis functions in R. Everything from installing the package to advanced data visualizations.
- Learn R on Code Academy: A free self-paced online class for learning to use R for data science and beyond.
- Nabble forum: A forum where individuals can ask specific questions about using R and get answers from the user community.
SPSS is an IBM product that is used for quantitative data analysis. It does not have a command line feature but rather has a user interface that is entirely point-and-click and somewhat resembles Microsoft Excel. Although it looks a lot like Excel, it can handle larger data sets faster and with more ease. One of the main complaints about SPSS is that it is prohibitively expensive to use, with individual packages ranging from $1,290 to $8,540 a year. To make up for how expensive it is, it is incredibly easy to learn. As a non-technical person I learned how to use it in under an hour by following an online tutorial from the University of Illinois Library. However, my take on this software is that unless you really need a more powerful tool just stick to Excel. They are too similar to justify seeking out this specialized software.
|Quick and easy to learn||By far the most expensive|
|Can handle large amounts of data||Limited functionality|
|Great user interface||Very similar to Excel|
- OpenLearn- Getting Started with SPSS: A free and open online class for learning to use SPSS for data analysis.
- LinkedIn Learning: SPSS Statistics Essentials Training: Free online class for learning the basics of SPSS.
- How to use SPSS: A step-by-step guide to analysis and interpretation by Brian Cronk: This book is a beginner’s guide to using SPSS for data analysis available through the Scholarly Commons collection.
Thanks for reading! Let us know in the comments if you have any thoughts or questions about any of these data analysis software programs. We love hearing from our readers!
This week, geographers around the globe took some time to celebrate the software that allows them to analyze, well, that very same globe. November 13th marked the 20th annual GIS Day, an “international celebration of geographic information systems,” as the official GIS Day website puts it.
But while GIS technology has revolutionized the way we analyze and visualize maps over the past two decades, the high cost of ArcGIS products, long recognized as the gold standard for cartographic analysis tools, is enough to deter many people from using it. At the University of Illinois and other colleges and universities, access to ArcGIS can be taken for granted, but many of us will not remain in the academic world forever. Luckily, there’s a high-quality alternative to ArcGIS for those who want the benefits of mapping software without the pricetag!
QGIS is a free, open source mapping software that has most of the same functionality as ArcGIS. While some more advanced features included in ArcGIS do not have analogues in QGIS, developers are continually updating the software and new features are always being added. As it stands now, though, QGIS includes everything that the casual GIS practitioner could want, along with almost everything more advanced users need.
As is often the case with open source software alternatives, QGIS has a large, vibrant community of supporters, and its developers have put together tons of documentation on how to use the program, such as this user guide. Generally speaking, if you have any experience with ArcGIS it’s very easy to learn QGIS—for a picture of the learning curve, think somewhere along the lines of switching from Microsoft Word to Google Docs. And if you don’t have experience, the community is there to help! There are many guides to getting started, including the one listed in the above link, and more forum posts of users working through questions together than anyone could read in a lifetime.
Have you made an interesting map in QGIS? Send us pictures of your creations on Twitter @ScholCommons!
GitHub is a platform mostly used by software developers for collaborative work. You might be thinking “I’m not a software developer, what does this have to do with me?” Don’t go anywhere! In this post I explain what GitHub is and how it can be applied to collaborative writing for non-programmers. Who knows, GitHub might become your new best friend.
Picture this: you and some colleagues have similar research interests and want to collaborate on a paper. You have divided the writing work to allow each of you to work on a different element of the paper. Using a cloud platform like Google Docs or Microsoft Word online you compile your work, but things start to get messy. Edits are made on the document and you are unsure who made them or why. Elements get deleted and you do not know how to retrieve your previous work. You have multiple files saved on your computer with names like “researchpaper1.dox”, “researchpaper1 with edits.dox” and “research paper1 with new edits.dox”. Managing your own work is hard enough but when collaborators are added to the mix it just becomes unmanageable. After a never ending reply-all email chain and what felt like the longest meeting of all time, you and your colleagues are finally on the same page about the writing and editing of your paper. It just makes you think, there has got to be a better way to do this. Issues with collaboration are not exclusive to writing, they happen all the time in programming, which is why software-developers came up with version control systems like Git and GitHub.
GitHub allows developers to work together through branching and merging. Branching is the process by which the original file or source code is duplicated into clone files. These clones contain all the elements already in the original file and can be worked in independently. Developers use these clones to write and test code before combining it with the original code. Once their version of the code is ready they integrate or “push” it into the source code in a process called merging. Then, other members of the team are alerted of these changes and can “pull” the merged code from the source code into their respective clones. Additionally, every version of the project is saved after changes are made, allowing users to consult previous versions. Every version of your project is saved with with descriptions of what changes were made in that particular version, these are called commits. Now, this is a simplified explanation of what GitHub does but my hope is that you now understand GitHub’s applications because what I am about to say next might blow your mind: GitHub is not just for programmers! You do not need to know any coding to work with GitHub. After all, code and written language are very similar.
Even if you cannot write a single line of code, GitHub can be incredibly useful for a variety of reasons:
1. It allows you to electronically backup your work for free.
2. All the different versions of your work are saved separately, allowing you to look back at previous edits.
3. It alerts all collaborators when a change is made and they can merge that change into their own versions of the text.
4. It allows you to write using plain text, something commonly requested by publishers.
Hopefully, if you’ve made it this far into the article you’re thinking, “This sounds great, let’s get started!” For more information on using GitHub you can consult the Library’s guide on GitHub or follow the step by step instructions on GitHub’s Hello-World Guide.
Here are some links to what others have said about using GitHub for non-programmers:
- Top Ten Reason GitHub is a Great Tool for Creative Writers by JJ Merelo
- Git for writers: Write fiction like a (good) programmer by Vanessa Guedes
- How writers can get work done better with Git by Seth Kenlon
- Git for Non-Programmers: How to use Git/GitHub as a non-technical person from Jarboo
Lynda.com had a long history with libraries. The online learning platform offered video courses to help people “learn business, software, technology and creative skills to achieve personal and professional goals.” Lynda.com paired well with other library services and collections, offering library users the chance to learn new skills at their own pace in an accessible and varied medium.
However, in 2015—twenty years after its initial launch—Lynda.com was purchased by LinkedIn. A year later, Microsoft purchased LinkedIn for $26.2 billion. And now, in 2019, Lynda.com content is available through the newly-formed LinkedIn Learning.
The good news is that this change from Lynda.com to LinkedIn Learning includes access to all of the same content previously available. This means that, through the University Library’s subscription, you still have access to courses on software like R, SQL, Tableu, Python, InDesign, Photoshop, and more (many of which are available to use on campus at the Scholarly Commons). There are also courses on broader, related topics like data science, database management, and user experience.
Setting up your own personal account to access LinkedIn Learning is where things get just a little trickier. As a result of the transition from Lynda.com to LinkedIn Learning, users are now strongly encouraged to link their personal LinkedIn accounts with their LinkedIn Learning accounts. Completing courses in LinkedIn Learning will earn you badges that are automatically carried over to your LinkedIn account. However, this additional step—using a personal LinkedIn account to access these course—also makes the information about your LinkedIn Learning as public as your LinkedIn profile. Because Lynda.com only required a library card and PIN, this change in privacy has received push-back from libraries and library organizations across the country.
This new policy change doesn’t mean you should avoid LinkedIn Learning, it just means you should use it with care and make an informed decision about your privacy settings. Maybe you want potential employers to see what you’re proactively learning about on the platform, maybe you to keep that information private. Either way, you can get details on setting up accounts and your privacy settings by consulting this guide created by Technology Services.
LinkedIn Learning can be accessed through the University Library here.
Congratulations! You made it through your first month back of the spring semester. From class work, to pouring rain, to enough snow and ice and make the university look like it’s auditioning for a role as Antarctica, you’re pushing forward!
Take a minute to look over all the awesome resources we have, right here in the Scholarly Commons, to help you keep chugging along with your research.
We are open 8:30 a.m. to 6 p.m., Monday through Friday. Our various, dual monitor computers have software ranging from Adobe Photoshop to OCR which can be paired with our various scanners to make machine readable PDFs!
Researchers can book free consultations thanks to our partnerships with CITL Data Analytics and Technology Services! In these meetings, you can learn about R, SAS, and everything else you need to just get started or to get past that tricky problem in your statistical research.
Beyond that, users can make appoints with our GIS specialist, and learn even more through our GIS resources. We have a ton of great books in our non-circulating reference collection that can help you learn about Python, GIS, and more!
And that’s not all: our Data Analytics & Visualization Librarian has put together a plethora of resources to help turn your data into art. Check out the four most common types of charts guide to get started!
And even this doesn’t cover all of our services!
The Scholarly Commons has all the resources you need to succeed, so stop by anytime! We’re always happy to help.
Back in October, we published a blog post introducing you to Google MyMaps, an easy way to display simple information in map form. Today we’re going to revisit that topic and explore some further ways in which MyMaps can help you visualize different kinds of data!
One of the most basic things that students of geography learn is the problem of projections: the earth is a sphere, and there is no perfect way to translate an image from the surface of a sphere to a flat plane. Nevertheless, cartographers over the years have come up with many projection systems which attempt to do just that, with varying degrees of success. Google Maps (and, by extension, Google MyMaps) uses perhaps the most common of these, the Mercator projection. Despite its ubiquity, the Mercator projection has been criticized for not keeping area uniform across the map. This means that shapes far away from the equator appear to be disproportionately larger in comparison with shapes on the equator.
Luckily, MyMaps provides a method of pulling up the curtain on Mercator’s distortion. The “Draw a line” tool, , located just below the search bar at the top of the MyMaps screen, allows users to create a rough outline of any shape on the map, and then drag that outline around the world to compare its size. Here’s how it works: After clicking on “Draw a line,” select “Add line or shape” and begin adding points to the map by clicking. Don’t worry about where you’re adding your points just yet, once you’ve created a shape you can move it anywhere you’d like! Once you have three or four points, complete the polygon by clicking back on top of your first point, and you should have a shape that looks something like this:
Now it’s time to create a more detailed outline. Click and drag your shape over the area you want to outline, and get to work! You can change the size of your shape by dragging on the points at the corners, and you can add more points by clicking and dragging on the transparent circles located midway between each corner. For this example, I made a rough outline of Greenland, as you can see below.
You can get as detailed as you want with the points on your shapes, depending on how much time you want to spend clicking and dragging points around on your computer screen. Obviously I did not perfectly trace the exact coastline of Greenland, but my finished product is at least recognizable enough. Now for the fun part! Click somewhere inside the boundary of your shape, drag it somewhere else on the map, and see Mercator’s distortion come to life before your eyes.
Here you can see the exact same shape as in the previous image, except instead of hovering over Greenland at the north end of the map, it is placed over Africa and the equator. The area of the shape is exactly the same, but the way it is displayed on the map has been adjusted for the relative distortion of the particular position it now occupies on the map. If that hasn’t sufficiently shaken your understanding of our planet, MyMaps has one more tool for illuminating the divide between the map and reality. The “Measure distances and areas” tool, , draws a “straight” line between any two (or more) points on the map. “Straight” is in quotes there because, as we’re about to see, a straight line on the globe (and therefore in reality) doesn’t typically align with straight lines on the map. For example, if I wanted to see the shortest distance between Chicago and Frankfurt, Germany, I could display that with the Measure tool like so:
The curve in this line represents the curvature of the earth, and demonstrates how the actual shortest distance is not the same as a straight line drawn on the map. This principle is made even more clear through using the Measure tool a little farther north.
The beginning and ending points of this line are roughly directly north of Chicago and Frankfurt, respectively, however we notice two differences between this and the previous measurement right away. First, this is showing a much shorter distance than Chicago to Frankfurt, and second, the curve in the line is much more distinct. Both of these differences arise, once again, from the difficulty of displaying a sphere on a flat surface. Actual distances get shorter the closer you get to the north (or south) ends of the map, which in turn causes all of the distortions we have seen in this post.
How might a better understanding of projection systems improve your own research? What are some other ways in which the Mercator projection (or any other) have deceived us? Explore for yourself and let us know!
This is a guest blog by the amazing Zachary Maiorana, a GA in Scholarly and Communication Publishing
Scholars and users have a vested interest in understanding the relative authority of publications they have either written or wish to cite to form the basis of their research. Although the literature search, a common topic in library instruction and research seminars, can take place on a huge variety of discovery tools, researchers often rely on Google Scholar as a supporting or central platform.
The massive popularity of Google Scholar is likely due to its simple interface, which bears the longtime prestige of Google’s search engine; its enormous breadth, with a simple search yielding millions of results; its compatibility and parallels with other Googles Chrome and Books; and its citation metrics mechanism.
This last aspect of Google Scholar, which collects and reports data on the number of citations a given publication receives, represents the platform’s apparent ability to precisely calculate the research community’s interest in that publication. But, in the University Library’s work on the Illinois Experts (experts.illinois.edu) research and scholarship portal, we have encountered a number of circumstances in which Google Scholar has misrepresented U of I faculty members’ research.
Recent studies reveal that Google Scholar, despite its popularity and its massive reach, is not only often inaccurate in its reporting of citation metrics and title attribution, but also susceptible to deliberate manipulation. In 2010, Labbé discusses an experiment using Ike Antkare (AKA “I can’t care”), a fictitious researcher whose bibliography was manufactured with a mountain of self-referencing citations. After the purposely falsified publications went public, Google’s bots didn’t differentiate Antkare’s research from his real-life peers during their crawling of his 100 generated articles. As a result, Google Scholar reported Antkare as one of the most cited researchers in the world, with a higher H-index* than Einstein.
In 2014, Spanish researchers conducted an experiment in which they created a fake scholar with several papers making hundreds of references to works written by the experimenters. After the papers were made public on a personal site, Google Scholar scraped the data and the real-life researchers’ profiles increased by 774 citations in total. In the hands of more nefarious users seeking to aggrandize their own careers or alter scientific opinion, such practices could result in large-scale academic fraud.
For libraries, Google’s kitchen-sink-included data collection methods further result in confusing and inaccurate attributions. In our work to supplement the automated collection of publication data for faculty profiles on Illinois Experts using CVs, publishers’ sites, journal sites, databases, and Google Scholar, we frequently encounter researchers’ names and works mischaracterized by Google’s clumsy aggregation mechanisms. For example, Google Scholar’s bots often read a scholar’s name somewhere within a work that the scholar hasn’t written—perhaps they were mentioned in the acknowledgements or in a citation—and simply attribute the work to them as author.
When it comes to people’s careers and the sway of scientific opinion, such snowballing mistakes can be a recipe for large-scale misdirection. Though much research exists that shows that, in general, Google Scholar currently represents highly cited research well, weaknesses persist. Blind distrust of any dominant proprietary platform is unwise, and using Google Scholar requires particularly careful judgment.
Read more on Google Scholar’s quality and reliability:
Brown, Christopher C. 2017. “Google Scholar.” The Charleston Advisor 19 (2): 31–34. https://doi.org/10.5260/chara.19.2.31.
Halevi, Gali, Henk Moed, and Judit Bar-Ilan. 2017. “Suitability of Google Scholar as a Source of Scientific Information and as a Source of Data for Scientific Evaluation—Review of the Literature.” Journal of Informetrics 11 (3): 823–34. https://doi.org/10.1016/j.joi.2017.06.005.
Labbé, Cyril. 2016. “L’histoire d’Ike Antkare et de Ses Amis Fouille de Textes et Systèmes d’information Scientifique.” Document Numérique 19 (1): 9–37. https://doi.org/10.3166/dn.19.1.9-37.
Lopez-Cozar, Emilio Delgado, Nicolas Robinson-Garcia, and Daniel Torres-Salinas. 2012. “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting.” ArXiv:1212.0638 [Cs], December. http://arxiv.org/abs/1212.0638.
Walker, Lizzy A., and Michelle Armstrong. 2014. “‘I Cannot Tell What the Dickens His Name Is’: Name Disambiguation in Institutional Repositories.” Journal of Librarianship and Scholarly Communication 2 (2). https://doi.org/10.7710/2162-3309.1095.
*Read the library’s LibGuide on bibliometrics for an explanation of the h-index and other standard research metrics: https://guides.library.illinois.edu/c.php?g=621441&p=4328607