Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
1.Importance: What professional norms and practices, both of individuals and of institutions or organizations, support or undermine the idea that data are legitimate and citable products of research? How?
Studies indicate that despite talking a good game about citation in general terms, scholarly journals tend not to actually cite data sets very effectively, perhaps with title in the note field, sometimes with author, but rarely location for electronic retrieval. Persistent identifiers for data sets remain very rare.
2.Credit and attribution: Credit and attribution of more traditional types of research products is an established norm and practice; is extending this practice to include data a simple and natural thing to do? Why or why not?
Evidently not, based on the sparse examples in the field. It may take some consciousness raising among the academic disciplines. Also, the question of offering co-authorship to data set creators is an emerging question as open science and the web makes it easier to detach data sets from the papers they support and re-use them as independent resources, as Duke and Porter indicate.
3.Evidence: Citing literature to support claims is also an established practice; is extending this practice to include data a simple and natural thing to do? Why or why not?
Although things are changing, some data sets are hard to access and thus hard to cite, embedded in PDF’s or otherwise hard to access. A tradition of data citation has yet to be broadly established. Lack of use of persistent identifiers remains lacking.
4.Unique identification: Is it always possible for a data creator obtain a persistent identifier for their data set? Why or why not?
There are a couple of dozen services which provide “minting” of new and unique DOI’s for one’s data set (like EZID in the California Digital Library system) though they aren’t free. They also must be maintained by the data owner, if the resource jumps around from server to server over time, as organizations go under or otherwise undergo changes in access.
5.Access: In practice, do data citations always provide direct access to the dataset? Why or why not?
Meta-repositories like Dryad function as union catalogs for data sets, providing only metadata. Thus, the data set itself may not be accessible to every comer.
6.Persistence: In practice, do data citations (and metadata) persist beyond the lifespan of the data set? Should they? Why or why not?
It’s not an automatic process; data sets held by organizations must still be transferred if the hosting organization terminates. Also, not all data sets stand the test of time and may be considered not worth the effort of preservation. Some datasets may be superseded by newer superior versions.
7.Specificity and verifiability: Why is it important to be able to create and maintain specific and verifiable references to data sets, portions of data sets, or versions of data sets? What are some potential challenges to doing so?
Maintaining references to data sets makes their results more likely to be potentially reproducible (perhaps mitigating a major embarrassment in academic disciplines of late, with scientific studies unable to be replicated). Current technology often doesn’t allow sufficient “granularity” in the formal citation of subsets of larger data sets.
8.Interoperability and flexibility: What are some of the different stakeholder groups whose practices may influence the ability to support interoperability across citation standards and styles?
Publishers, universities, repositories, the writers/researchers themselves all have an interest.
What are three factors when considering whether acknowledgement, formal citation, or co-authorship is the most appropriate way to provide attribution to the creator of a data set used in a publication?
Does the data set creator want to be acknowledged in the paper? For example, if he disagrees with the conclusions of the paper’s author?
If the creator does want acknowledgement, will the journal do so? Many journals don’t have a formalized rule in citing the authors of data sets.
Is the data set vital to the paper – would it be unpublishable if the data set was removed? Is it a unique interpretation? Is it the sole source of the data of the paper? Or is the data set one of many similar used by the paper? There may be handicaps in the ability to cite the data, like a lack of a persistent identifier or lack of accessibility.
Data Citation
Ellison, Aaron; Bennett, Katherine (2009): Sarracenia Purpurea Prey Capture at Harvard Forest 2008. Long Term Ecological Research Network. http://dx.doi.org/10.6073/pasta/9a6105374adb15486b75cf621a2702dd
Authors (last name/first name); Publication date; Publication title (which in this case includes geographic location of project, description of purpose, and date);Digital Object Identifier; Organization.
Nepstad, D.C., E.A. Davidson, D. Markewitz, E.J.M. Carvalho, J.Q. Chambers, D. Ray, J.B. Guerrero, P. Lefebvre, L. Sternberg, M. Moreira, L. Barros, F.Y. Ishida, I. Tohlver, E.L. Belk, K. Kalif, and K. Schwalbe. 2012. LBA-ECO ND-30 Water Chemistry, Rainfall Exclusion, km 67, Tapajos National Forest. Data set. Available on-line [http://daac.ornl.gov] from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.http://dx.doi.org/10.3334/ORNLDAAC/1131
Authors (last name/first and middle name initials for first one, with the order reversed for the rest, perhaps listed in descending order of contribution?); temporal data on project; description of project; scope limiters; geographic location of project; mention of “data set”; digital location for data access; geographical location for data access; Digital Object Identifier.
Creating Data Citations
Data Set #1
Chaneton, E.J., Tognetti, P.M. (2014): Community disassembly and invasion of remnant native grasslands under fluctuating resource supply. Inland Pampa, Buenos Aires. Dryad Digital Repository. doi:10.5061/dryad.46181
Data Set #2
Bret-Harte, M.S., Laundre, J., Mack, M.C., Shaver, G. (2011): Soil properties and nutrient concentrations by depth from the Anaktuvuk River Fire site in 2011. Latitude 68.99 Longitude -150.28, Alaska. Advanced Cooperative Arctic Data and Information Service. https://www.aoncadis.org/dataset/2011ARF_SoilCN_byDepth.2.html