Pingjing Yang: Understanding Data Analysis Workflows on Spreadsheets

Title: Understanding Data Analysis Workflows on Spreadsheets
Session Lead: Pingjing Yang
Time: 10 am – 11 am, Thursday, 2022-02-17
Location: Zoom


Spreadsheets are widely used for data management and analysis by individuals and teams with varying degrees of programming expertise across a spectrum of domains. While several papers have studied the prevalence of errors on spreadsheets and performed ethnographic studies on spreadsheet use, little is known about how spreadsheet users approach and address computational tasks on spreadsheets, especially on relatively large datasets. To understand how users analyze data on spreadsheets, we conducted a study consisting of eight common analytical tasks, with thirty-two participants. Participants developed an execution strategy for each task and then attempted to operationalize this strategy within the spreadsheet system. From examining the study results and transcripts, we identified the successful and unsuccessful strategies participants adopted in addressing the tasks. In general, we find that unsuccessful spreadsheet users had difficulties mapping spreadsheet models to their predetermined execution strategies, comprehending online help documents when trying to learn how to use new formulae, and identifying workarounds when confronted with roadblocks. We identify opportunities to reduce barriers in computational task completion, including improvements to the spreadsheet interface and better training/educational methodologies and tools.

Readings: [Box-Folder]