Allen Renear: Data Identity: Did they use the same data?

Title: Data Identity: Did they use the same data?
Session Lead: Dr. Allen Renear
Time: 1 pm – 2 pm, Tuesday, 2021-11-02
Location: Zoom


It is obviously sometimes important to know such things as whether two research groups have used the same data, or, perhaps, whether you are using the same data on Thursday that you used on Monday. A checksum will often not do the trick here: it’s too quick to say “no”.  What you want is a way to determine “scientific equivalence” even though many sorts of differences are present.  More generally, questions of the form “… is (has) the same data as —?”  are common whenever we are designing, managing, or analyzing information systems. So, not surprisingly, data identity is a topic some of us have been thinking about here at the iSchool for quite some time.  As a way of possibly restarting that exploration I review some of the work that has been done and suggests that there is still more to do. As at the last CFG the SAM/BRM framework of data concepts will be deployed to good effect.  No readings in advance, but I’ll put together a list of relevant works, for future reference.