Allen Renear: Toward an Intensional Approach to Transformation Classification

Title: Toward an Intensional Approach to Transformation Classification
Session Lead: Allen Renear
Time: 11 am – noon, Wednesday, 2021-02-10
Location: Zoom

Details:

This week, Prof. Renear will lead a discussion on their 2018 ASIST paper ‘ Toward an Intensional Approach to Transformation Classification’.

ABSTRACT: Generating one dataset from another is a fundamental activity in data science: data curators convert datasets to different file formats, create data subsets, generate metadata, integrate data from multiple sources, and so on; data analysts generate summaries and classifications, create visualizations, and derive data about one sort of thing from data about another sort of thing. Although such transformations have been studied from a variety of perspectives, there has been little effort to develop a general classification based on intrinsic (rather than functional) characteristics, apart from computational complexity. With this paper we hope to motivate a classification of transformations based on the relationships between the intensional features of the input and output datasets, that is, their propositional and conceptual content. Intensional entities are the fundamental components of scientific reasoning and explanation and consequently deserve a uniquely central role in the analysis of information work. We believe such a classification would be a valuable contribution to the data curation curriculum. This paper is an introduction to that project.

INTRODUCTION

Transforming one dataset into another is routinely carried out in businesses, laboratories, government offices, and many other venues throughout our society. Moreover, although not always visible as such, it is often an immediate consequence of many of the familiar actions performed in the ordinary lives of people throughout the world.

Transformations are particularly central in those areas of science, commerce, and the professions where digital data and digital tools are common and data management tasks essential. For example, to ensure preservation, support integration with other datasets, and support the use of specialized software tools, data curators routinely convert datasets to different file formats or encodings. They also update information, make corrections, extract subsets, generate metadata, and so on.

Transformations are also central to the work of scientists and data analysts; datasets of particular observations are routinely transformed into datasets containing statistical summaries, classifications, and predictions. Of particular interest are transformations that derive data about one sort of thing from data about an entirely different sort of thing: data about climate in the distant past might be derived from data about tree rings, data about temperature on surface of the earth from electromagnetic radiation collected by geosynchronous satellites, and data about altitude from data about air pressure. Sometimes these things occur in sporadic single episodes, sometimes as part of an extended structured process or scientific workflow (Ludäscher, Marciano, & Moore, 2001).

Although dataset transformations are important and much studied, there have been only a few efforts to classify them generally as to kind. For such a classification, one might use computational features such as the expressiveness and efficiency of the underlying data description language, data structures, and algorithms. Alternatively, a classification can be made according to the purpose or function of different transformations within a system or organizational context.

In this paper we propose a different but complementary approach, one based on the intensional features of the input and output datasets, that is, on their propositional and conceptual content — the natural cognitive components of scientific explanation and reasoning. We believe that this approach can provide additional needed support for understanding information work and for preparing information professionals, particularly data curators. These are some first thoughts towards such a project.

Related Materials: link