On the Digital Perimeter: September 2010

For a while I also subscribed to the view that digital and data curation may be the same thing - a view reaffirmed upon reading Alex Ball’s detailed and informative survey of the digital curation/preservation field. However, recently I had to re-examine my position. I realized that such an approach makes the work of curating the actual data sets much more difficult, as it diverts the attention of the curator from the point where the data are actually created or accessed and worked with, and instead makes one focus on more preservation-oriented stages that come much later in the data life-cycle. The shibboleth of OAIS even further corroborates this situation. If one looks at the OAIS life-cycle – it does not even consider creation of data – it’s an input, a SIP-to-be of which we know nothing – an unspecified activity on the fringe. Only after a data object is appended with representation information and properly packaged does it enter the OAIS picture. The file stream itself can be a known and valid format, we can know who created the data object, when, and even what application was used for generating it, but the data themselves can have a minimal information value, because the data were not collected or annotated properly.

In this sense, the DCC data model appears to be more useful, as it depicts the whole trajectory of the data. And it will be beneficial for everyone involved in the data life-cycle to be aware of all those phases. Elizabeth Yakel in her overview of the digital curation field* rightly stresses the “active and potentially interactive process” of curation. Even if I have rarely seen the term “records creators” in the curation literature, the term data producers or data creators are used more frequently.

I am also not opposed to the term curation as a pendant to preservation. From the perspective of an archivist or librarian, collection may seem more appropriate, but to me that implies possession of or some kind of control over the data, but in the research data lifecycle this does not have to be the case. The data is often in active use in the Re-use and Transform phase: the data can be transferred to different units, used for plotting and further analysis – the data files are in this phase are very transient and may not be preserved in that state at all.

The active involvement of data curators in the process of data production, often from the very beginning, is probably the greatest difference between curators and archivists who usually deal with material that is past its “active life.” I do not think that the major thinkers behind archival theory argued for “original order” out of professional shyness, but because they thought that the original creators of records kept their records in an effective and functional order. That may have been more or less true in their time. However, the business practices and means of personal communication have changed dramatically in last two or three decades. Nowadays even archivists try to be pro-active and advice lay audience how to take care of their digital files and their personal communications through initiatives like How to Preserve Your Own Digital Materials.

It seemed to me that many of the articles that tried to define digital curation and describe its current state of affairs take a position which is more about accenting the preservation aspects of the lifecycle. This may be because the mechanics and infrastructure for preservation is more general and also transferable from one domain to another, but for access to and transformation of the actual research data files was not always adequate maybe because subject expertise provides an effective barrier for a general treatment of that issue. Many texts have elaborate on preservation extensively, but data curation is usually just mentioned as a term replaced or superseded by the digital curation, as if we knew everything we need to know about how to create and manage research data, but that's just not the case. Not yet.

On the Digital Perimeter

Tuesday, 21 September 2010

Reclaiming Data in Digital Curation

About Me

Blog Archive