Introductions: What is Digital Scholarship, anyways?

This is the beginning of a new series where we introduce you to the various topics that we cover in the Scholarly Commons. Maybe you’re new to the field or you’re just to the point where you’re just too afraid to ask… Fear not! We are here to take it back to the basics!

What is digital scholarship, anyways?

Digital scholarship is an all-encompassing term and it can be used very broadly. Digital scholarship refers to the use of digital tools, methods, evidence, or any other digital materials to complete a scholarly project. So, if you are using digital means to construct, analyze, or present your research, you’re doing digital scholarship!

It seems really basic to say that digital scholarship is any project that uses digital means because nowadays, isn’t that every project? Yes and No. We use the term digital quite liberally…If you used Microsoft Word to just write your essay about a lab you did during class – that is not digital scholarship however if you used specialized software to analyze the results from a survey you used to gather data then you wrote about it in an essay that you then typed in Microsoft Word, then that is digital scholarship! If you then wanted to get this essay published and hosted in an online repository so that other researchers can find your essay, then that is digital scholarship too!

Many higher education institutions have digital scholarship centers at their campus that focus on providing specialized support for these types of projects. The Scholarly Commons is a digital scholarship space in the University Main Library! Digital scholarship centers are often pushing for new and innovative means of discovery. They have access to specialized software and hardware and provide a space for collaboration and consultations with subject experts that can help you achieve your project goals.

At the Scholarly Commons, we support a wide array of topics that support digital and data-driven scholarship that this series will cover in the future. We have established partners throughout the library and across the wider University campus to support students, staff, and faculty in their digital scholarship endeavors.

Here is a list of the digital scholarship service points we support:

You can find a list of all the software the Scholarly Commons has to support digital scholarship here and a list of the Scholarly Commons hardware here. If you’re interested in learning more about the foundations of digital scholarship follow along to our Introductions series as we got back to the basics.

As always, if you’re interested in learning more about digital scholarship and how to  support your own projects you can fill out a consultation request form, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Simple NetInt: A New Data Visualization Tool from Illinois Assistant Professor, Juan Salamanca

Juan Salamanca Ph.D, Assistant Professor in the School of Art and Design at the University of Illinois Urbana-Champaign recently created a new data visualization tool called Simple NetInt. Though developed from a tool he created a few years ago, this tool brings entirely new opportunities to digital scholarship! This week we had the chance to talk to Juan about this new tool in data visualization. Here’s what he said…

Simple NetInt is a JavaScript version of NetInt, a Java-based node-link visualization prototype designed to support the visual discovery of patterns across large dataset by displaying disjoint clusters of vertices that could be filtered, zoomed in or drilled down interactively. The visualization strategy used in Simple NetInt is to place clustered nodes in independent 3D spaces and draw links between nodes across multiple spaces. The result is a simple graphic user interface that enables visual depth as an intuitive dimension for data exploration.

Simple NetInt InterfaceCheck out the Simple NetInt tool here!

In collaboration with Professor Eric Benson, Salamanca tested a prototype of Simple NetInt with a dataset about academic publications, episodes, and story locations of the Sci-Fi TV series Firefly. The tool shows a network of research relationships between these three sets of entities similar to a citation map but on a timeline following the episodes chronology.

What inspired you to create this new tool?

This tool is an extension of a prototype I built five years ago for the visualization of financial transactions between bank clients. It is a software to visualize networks based on the representation of entities and their relationships and nodes and edges. This new version is used for the visualization of a totally different dataset:  scholarly work published in papers, episodes of a TV Series, and the narrative of the series itself. So, the network representation portrays relationships between journal articles, episode scripts, and fictional characters. I am also using it to design a large mural for the Siebel Center for Design.

What are your hopes for the future use of this project?

The final goal of this project is to develop an augmented reality visualization of networks to be used in the field of digital humanities. This proof of concept shows that scholars in the humanities come across datasets with different dimensional systems that might not be compatible across them. For instance, a timeline of scholarly publications may encompass 10 or 15 years, but the content of what is been discussed in that body of work may encompass centuries of history. Therefore, these two different temporal dimensions need to be represented in such a way that helps scholars in their interpretations. I believe that an immersive visualization may drive new questions for researchers or convey new findings to the public.

What were the major challenges that came with creating this tool?

The major challenge was to find a way to represent three different systems of coordinates in the same space. The tool has a universal space that contains relative subspaces for each dataset loaded. So, the nodes instantiated from each dataset are positioned in their own coordinate system, which could be a timeline, a position relative to a map, or just clusters by proximities. But the edges that connect nodes jump from one coordinate system to the other. This creates the idea of a system of nested spaces that works well with few subspaces, but I am still figuring out what is the most intuitive way to navigate larger multidimensional spaces.

What are your own research interests and how does this project support those?

My research focuses on understanding how designed artifacts affect the viscosity of social action. What I do is to investigate how the design of artifacts facilitates or hinders the cooperation of collaboration between people. I use visual analytics methods to conduct my research so the analysis of networks is an essential tool. I have built several custom-made tools for the observation of the interaction between people and things, and this is one of them.

If you would like to learn more about Simple NetInt you can find contact information for Professor Juan Salamanca here and more information on his research!

If you’re interested in learning more about data visualizations for your own projects, check out our guide on visualizing your data, attend a Savvy Researcher Workshop, Live Chat with us on Ask a Librarian, or send us an email. We are always happy to help!

Holiday Data Visualizations

The fall 2020 semester is almost over, which means that it is the holiday season again! We would especially like to wish everyone in the Jewish community a happy first night of Hanukkah tonight.

To celebrate the end of this semester, here are some fun Christmas and Hanukkah-related data visualizations to explore.

Popular Christmas Songs

First up, in 2018 data journalist Jon Keegan analyzed a dataset of 122 hours of airtime from a New York radio station in early December. He was particularly interested in discovering if there was a particular “golden age” of Christmas music, since nowadays it seems that most artists who release Christmas albums simply cover the same popular songs instead of writing a new song. This is a graph of what he discovered:

Based on this dataset, 65% of popular Christmas songs were originally released in the 1940s, 50s, and 60s. Despite the notable exception of Mariah Carey’s “All I Want for Christmas is You” from the 90s, most of the beloved “Holiday Hits” come from the mid-20th century.

As for why this is the case, the popular webcomic XKCD claims that every year American culture tries to “carefully recreate the Christmases of Baby Boomers’ childhoods.” Regardless of whether Christmas music reflects the enduring impact of the postwar generation on America, Keegan’s dataset is available online to download for further exploration.

Christmas Trees

Last year, Washington Post reporters Tim Meko and Lauren Tierney wrote an article about where Americans get their live Christmas trees from. The article includes this map:

The green areas are forests primarily composed of evergreen Christmas trees, and purple dots represent Choose-and-cut Christmas tree farms. 98% of Christmas trees in America are grown on farms, whether it’s a choose-and-cut farm where Americans come to select themselves or a farm that ships trees to stores and lots.

This next map shows which counties produce the most Christmas trees:

As you can see, the biggest Christmas tree producing areas are New England, the Appalachians, the Upper Midwest, and the Pacific Northwest, though there are farms throughout the country.

The First Night of Hanukkah

This year, Hanukkah starts tonight, December 10, but its start date varies every year. However, this is not the case on the primarily lunar-based Hebrew Calendar, in which Hanukkah starts on the 25th night of the month of Kislev. As a result, the days of Hanukkah vary year-to-year on other calendars, particularly the solar-based Gregorian calendar. It can occur as early as November 28 and as late as December 26.

In 2016, Hannukah began on December 24, Christmas Eve, so Vox author Zachary Crockett created this graphic to show the varying dates on which the first night of Hannukah has taken place from 1900 to 2016:

The Spelling of Hanukkah

Hanukkah is a Hebrew word, so as a result there is no definitive spelling of the word in the Latin alphabet I am using to write this blog post. In Hebrew it is written as חנוכה and pronounced hɑːnəkə in the phonetic alphabet.

According to Encyclopædia Britannica, when transliterating the pronounced word into English writing, the first letter ח, for example, is pronounced like the ch in loch. As a result, 17th century transliterations spell the holiday as Chanukah. However, ח does not sounds like the way ch does when its at the start of an English word, such as in chew, so in the 18th century the spelling Hanukkah became common. However, the H on its own is not quite correct either. More than twenty other spelling variations have been recorded due to various other transliteration issues.

It’s become pretty common to use Google Trends to discover which spellings are most common, and various journalists have explored this in past years. Here is the most recent Google search data comparing the two most commons spellings, Hanukkah and Chanukah going back to 2004:

You can also click this link if you are reading this article after December 2020 and want even more recent data.

As you would expect, the terms are more common every December. It warrants further analysis, but it appears that Chanukah is becoming less common in favor of Hanukkah, possibly reflecting some standardization going on. At some point, the latter may be considered the standard term.

You can also use Google Trends to see what the data looks like for Google searches in Israel:

Again, here is a link to see the most recent version of this data.

In Israel, it also appears as though the Hanukkah spelling is also becoming increasingly common, though early on there were years in which Chanukah was the more popular spelling.


I hope you’ve enjoyed seeing these brief explorations into data analysis related to Christmas and Hanukkah and the quick discoveries we made with them. But more importantly, I hope you have a happy and relaxing holiday season!

Exploring Data Visualization #18

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Painting the World with Water

Creating weather predictions is a complex tasks that requires global collaboration and advanced scientific technologies. Most people know very little about how a weather prediction is put together and what is required to make it possible. NASA gives us a little glimpse into the complexities of finding out just how we know if it’s going to rain or snow anywhere in the world.

Continue reading

Exploring Data Visualization #17

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

The unspoken rules of visualization

Title header of essay "The unspoken rules of data visualization" by Kaiser Fung. White text on a black background with green and red patches Continue reading

Stata vs. R vs. SPSS for Data Analysis

As you do research with larger amounts of data, it becomes necessary to graduate from doing your data analysis in Excel and find a more powerful software. It can seem like a really daunting task, especially if you have never attempted to analyze big data before. There are a number of data analysis software systems out there, but it is not always clear which one will work best for your research. The nature of your research data, your technological expertise, and your own personal preferences are all going to play a role in which software will work best for you. In this post I will explain the pros and cons of Stata, R, and SPSS with regards to quantitative data analysis and provide links to additional resources. Every data analysis software I talk about in this post is available for University of Illinois students, faculty, and staff through the Scholarly Commons computers and you can schedule a consultation with CITL if you have specific questions.

Short video loop of a kid sitting at a computer and putting on sun glasses

Rock your research with the right tools!


STATA

Stata logo. Blue block lettering spelling out Stata.

Among researchers, Stata is often credited as the most user-friendly data analysis software. Stata is popular in the social sciences, particularly economics and political science. It is a complete, integrated statistical software package, meaning it can accomplish pretty much any statistical task you need it to, including visualizations. It has both a point-and-click user interface and a command line function with easy-to-learn command syntax. Furthermore, it has a system for version-control in place, so you can save syntax from certain jobs into a “do-file” to refer to later. Stata is not free to have on your personal computer. Unlike an open-source program, you cannot program your own functions into Stata, so you are limited to the functions it already supports. Finally, its functions are limited to numeric or categorical data, it cannot analyze spatial data and certain other types.

 

Pros

Cons

User friendly and easy to learn An individual license can cost
between $125 and $425 annually
Version control Limited to certain types of data
Many free online resources for learning You cannot program new
functions into Stata

Additional resources:


R logo. Blue capital letter R wrapped with a gray oval.

R and its graphical user interface companion R Studio are incredibly popular software for a number of reasons. The first and probably most important is that it is a free open-source software that is compatible with any operating system. As such, there is a strong and loyal community of users who share their work and advice online. It has the same features as Stata such as a point-and-click user interface, a command line, savable files, and strong data analysis and visualization capabilities. It also has some capabilities Stata does not because users with more technical expertise can program new functions with R to use it for different types of data and projects. The problem a lot of people run into with R is that it is not easy to learn. The programming language it operates on is not intuitive and it is prone to errors. Despite this steep learning curve, there is an abundance of free online resources for learning R.

Pros

Cons

Free open-source software Steep learning curve
Strong online user community Can be slow
Programmable with more functions
for data analysis

Additional Resources:

  • Introduction to R Library Guide: Find valuable overviews and tutorials on this guide published by the University of Illinois Library.
  • Quick-R by DataCamp: This website offers tutorials and examples of syntax for a whole host of data analysis functions in R. Everything from installing the package to advanced data visualizations.
  • Learn R on Code Academy: A free self-paced online class for learning to use R for data science and beyond.
  • Nabble forum: A forum where individuals can ask specific questions about using R and get answers from the user community.

SPSS

SPSS logo. Red background with white block lettering spelling SPSS.

SPSS is an IBM product that is used for quantitative data analysis. It does not have a command line feature but rather has a user interface that is entirely point-and-click and somewhat resembles Microsoft Excel. Although it looks a lot like Excel, it can handle larger data sets faster and with more ease. One of the main complaints about SPSS is that it is prohibitively expensive to use, with individual packages ranging from $1,290 to $8,540 a year. To make up for how expensive it is, it is incredibly easy to learn. As a non-technical person I learned how to use it in under an hour by following an online tutorial from the University of Illinois Library. However, my take on this software is that unless you really need a more powerful tool just stick to Excel. They are too similar to justify seeking out this specialized software.

Pros

Cons

Quick and easy to learn By far the most expensive
Can handle large amounts of data Limited functionality
Great user interface Very similar to Excel

Additional Resources:

Gif of Kermit the frog dancing and flailing his arms with the words "Yay Statistics" in block letters above

Thanks for reading! Let us know in the comments if you have any thoughts or questions about any of these data analysis software programs. We love hearing from our readers!

 

Exploring Data Visualization #16

Daylight Saving Time Gripe Assistant Tool

Clocks fell back this weekend, which means the internet returns once again to the debate of whether or not we still need Daylight Saving Time. Andy Woodruff, a cartographer for Axis Maps, created a handy tool for determining how much you can complain about the time change. You input your ideal sunset and sunrise times, select whether the sunset or sunrise time you chose is more important, and the tool generates a map that shows whether DST should be gotten rid of, used year-round, or if no changes need to be made based on where you live. The difference a half hour makes is surprising for some of the maps, making this a fun data viz to play around with and examine your own gripes with DST.

A map of the United States with different regions shaded in different colors to represent if they should keep (gray) or get rid of (gold) changing the clocks for Daylight Saving Time. Blue represents areas that should always use Daylight Saving Time.

This shows an ideal sunrise of 7:00 am and an ideal sunset of 6:00 pm.

Laughing Online

Conveying tone through text can be stressful—finding the right balance of friendly and assertive in a text is a delicate operation that involves word choice and punctuation equally. Often, we make our text more friendly through exclamations points! Or by adding a quick laugh, haha. The Pudding took note of how varied our use of text-based laughs can be and put together a visual essay on how often we use different laughs and whether all of them actually mean we are “laughing out loud.” The most common laugh on Reddit is “lol,” while “hehe,” “jaja,” and “i’m laughing” are much less popular expressions of mirth.

A proportional area chart showing which text laughs are most used on Reddit.

“ha” is the expression most likely to be used to indicate fake laughter or hostility

how to do it in Excel: a shaded range

Here’s a quick tip for making more complex graphs using Excel! Storytelling with Data’s Elizabeth Ricks put together a great how-to article on making Excel show a shaded range on a graph. This method involves some “brute force” to make Excel’s functions work in your favor, but results in a clean chart that shows a shaded range rather than a cluster of multiple lines.

A shaded area chart in Excel

Pixelation to represent endangered species counts

On Imgur, user JJSmooth44 created a photo series to demonstrate the current status of endangered species using pixilation. The number of squares represent the approximate number of that species that remains in the world. The more pixelated the image, the fewer there are left.

A pixelated image of an African Wild Dog. The pixelation represents approximately how many of this endangered species remain in the wild (estimated between 3000 and 5500). The Wild Dog is still distinguishable, but is not clearly visible due to the pixelation.

The African Wild Dog is one of the images in which the animal is still mostly recognizable.

Scary Research to Share in the Dark: A Halloween-Themed Roundup

If you’re anything like us here in the Scholarly Commons, the day you’ve been waiting for is finally here. It’s time to put on a costume, eat too much candy, and celebrate all things spooky. That’s right, folks. It’s Halloween and we couldn’t be happier!

Man in all black with a jack o' lantern mask dancing in front of a green screen cemetery

If you’ve been keeping up with our Twitter (@ScholCommons) this month, you’ve noticed we’ve been sharing some ghoulish graphs and other scary scholarship. To keep the holiday spirit(s) high, I wanted to use this week’s blog post to gather up all our favorites.

First up, check out the most haunted cities in the US on The Next Web, which includes some graphs but also a heat map of the most haunted areas in the country. Which region do you think has the most ghosts?

If you’re more interested in what’s happening on across the pond, we’ve got you covered. Click on this project to see just how scary ArcGIS story maps can be.

https://twitter.com/ScholCommons/status/1187058855282462721

And while ghosts may be cool, we all know the best Halloween characters are all witches. Check out this fascinating project from The University of Edinburgh that explores real, historic witch hunts in Scotland.

The next project we want to show you might be one of the scariest. I was absolutely horrified to find out that Illinois’ most popular Halloween candy is Jolly Ranchers. If you’re expecting trick-or-treaters tonight, please think of the children and reconsider your candy offerings.

Now that we’ve share the most macabre maps around, let’s shift our focus to the future. Nathan Yau uses data to predict when your death will occur. And if this isn’t enough to terrify you, try his tool to predict how you’ll die.

Finally, if you’re looking for some cooking help from an AI or a Great Old One, check out this neural network dubbed “Cooking with Cthulhu.”

Do you have any favorite Halloween-themed research projects? If so, please share it with us here or on Twitter. And if you’re interested in doing your own deadly digital scholarship, feel free to reach out to the Scholarly Commons to learn how to get started or get help on your current work. Remember, in the words everyone’s favorite two-faced mayor…

A clip of the Mayor from Nightmare Before Christmas saying There's only 365 days left until next Halloween

Exploring Data Visualization #15

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

Which milk has the smallest impact on the planet?

Climate change impacts each of us in direct and indirect ways. Mitigating your personal carbon footprint is an important way to address climate change for many people, but often people are unsure how to make choices that benefit the climate. Daniela Haake from Datawrapper took a close look at how her choice to have milk in her coffee was damaging or benefiting the planet and it turns out things aren’t looking great for café con leche. The chart Haake published, created using data from Dr. Joseph Poore, compares the carbon emissions, land use, and water use of milk and the top 4 milk alternatives.

A chart comparing the carbon emissions, land use, and water use of milk and the top 4 milk alternatives

Soy milk has the lowest overall impact on carbon emissions, land use, and water use.

Here’s Who Owns the Most Land in America

“The 100 largest owners of private property in the U.S., newcomers and old-timers together, have 40 million acres, or approximately 2% of the country’s land mass,” Bloomberg News reports. The people who own this land are the richest people in the country, and their wealth has grown significantly over the last 10 years. Bloomberg created a map that demonstrates where the land these people own is located. Compared to the rest of the country, the amount of land owned by these people looks relatively small—could Bloomberg have presented more information about why it is significant that these people own land in these areas? And about why they own so much land?

A map of the continental United States with the land owned by the 10 largest owners of private property highlighted

This image shows only the land owned by the top 10 landowners.

How to Get Better at Embracing Unknowns

Representing our uncertainty in data can be difficult to do clearly and well. In Scientific American this month, Jessica Hullman analyzed different methods of representing uncertainty for their clarity and effectiveness. While there may be no perfect way to represent uncertainty in your data viz, Hullman argues that “the least effective way to present uncertainty is to not show it at all.” Take a look at Hullman’s different ways to represent uncertainty and see if any might work for your next project!

A gif showing two methods of demonstrating uncertainty in data visualizations through animation

Animated charts make uncertainty impossible to ignore

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.

Exploring Data Visualization #14

In this monthly series, I share a combination of cool data visualizations, useful tools and resources, and other visualization miscellany. The field of data visualization is full of experts who publish insights in books and on blogs, and I’ll be using this series to introduce you to a few of them. You can find previous posts by looking at the Exploring Data Visualization tag.

A day in the life of Americans: a data comic

A comic demonstrating the amount of time Americans spend sleeping, at work, free, or doing other activities from 4 a.m. to 3 p.m.

By illustrating the activity most Americans are doing at a given hour, Hong highlights what the average day looks like for an American worker.

Happy May, researchers! With the semester winding down and summer plans on the horizon, a lot of us are reflecting on what we’ve done in the past year. Sometimes it can be hard to determine what your daily routine looks like when you are the one doing it every day. Matt Hong created a cute and informative data comic about how we spend our time during the day, based on data from the Census Bureau. Check out Hong’s Medium page for more data comics.

What Qualifies as Middle-Income in Each State

A bar chart that shows the range of incomes that qualify as "middle-income" for households made up of four people, organized by state.

The distribution of middle-income for households made up of four people.

Nathan Yau at Flowing Data created an interesting chart that shows the range of income that is considered “middle-income” in each state and the District of Columbia in the United States. The design of the chart itself is smooth and watching the transitions between income ranges based on number of people in the household is very enjoyable. It is also enlightening to see where states fall on the spectrum of what “middle-income” means, and this visualization could be a useful tool for researchers working on wage disparity.

When People Find a New Job

A frequency trail chart that shows peaks based on the age when people change jobs.

The bottom of the chart shows jobs that people transition into later in life.

The end of the semester also means a wave of new graduates entering the workforce. While we extend our congratulations to those people, we often inquire about what their upcoming plans are and where they will be working in the future. For some, that question is straightforward; for others, a change of pace may be on the horizon. Nathan Yau of Flowing Data also created a frequency trail chart that shows at what age many people change career paths. As Yau demonstrates in a bar chart that accompanies the frequency trail chart, the majority of job switches happen early and late in life, a phenomenon which he offers some suggestions for.

A bar chart showing the distribution of the age at which people switch jobs. 15-19 is the highest percent (above 30%) and 55-64 is the lowest (around 10%)

The peak at the “older” end of the chart indicates some changes post-retirement, but also makes you wonder why people are still finding new jobs at age 85 to 89.

I hope you enjoyed this data visualization news! If you have any data visualization questions, please feel free to email the Scholarly Commons.