Why Publish Data?

Data are first class research objects, the publication of which provide immediate benefit to authors in terms of elevating their scholarly profile and longer-term benefits to the scientific institution in terms of more effective and trustworthy knowledge acquisition. EDI works with the informatics community to realize these objectives.

Receive attribution

Publishing data in a repository clarifies who should receive credit in the form of citation or co-authorship when others use the data. Prior publication of data in a repository can help protect a data author from having their data misattributed. Researchers who publish data are more likely to be invited as co-authors on publications using the data and can increase research impact and citation rate. [2, 9, 10, 11]

Articles citing data published in the EDI Repository are automatically cross referenced to research profiles (e.g. ORCiD, Google Scholar) and are displayed on data package landing page under the Journal Citations section.

Meet requirements

Journals are increasingly requiring submission of supporting data with article publication.Likewise, funding agencies are increasingly requiring evidence of data publication in prior-support sections of proposals [8, 13]. These requirements facilitate confidence in the research findings and provide a higher return on investment. Surveys have shown that public trust in scientific research is enhanced when the supporting data is made available [3, 7].

EDI tools simplify reporting:

  • Package Tracker - List and plot downloads of a published data package.
  • Site Report - List references to all data packages published by a research site. Currently only available to LTER sites.
  • Upload Report - List references to all data packages published by a research site within a specified time frame. Currently only available to LTER sites.
  • Retrieve Citation Metrics - Get journal citations referencing a data publication.

Improve data management

Published data have extra consistency checks, are immutable, and versioned . Without such steps data corruption (computer errors and inadvertent edits) can go undetected. Moreover, it is easy to unknowingly base analyses on different versions of a dataset. Development of formal versions and quality indicators, such as checksums, avoid these problems [8].

It is common for researchers to return to old datasets they, themselves, collected in the past in order to apply them to new theories. However, if the data are not published with suitable metadata, the data can quickly become unreadable or uninterpretable (due to data file format changes, computer glitches, forgetting the details), even by the researcher who originally collected the data.

The most frequent users of data are the subsequent graduate students in the same research group. If the data are not formally published, they are usually lost upon graduation. It saves time and effort to publish the data once, while the details are still fresh.

Enable new scientific insight

Without data, science is just philosophy. Many theories require data beyond the capabilities of any single individual to collect them, so publishing data advances the science as a whole by helping provide the needed data resources [1, 4, 5, 6]. Additionally, the demand for reproducible research and open collaborative science depends on findable and accessible data provided by data repositories.

EDI tools enable reuse: * Code Generation - Read data into common computational languages (e.g. R, Python, MatLab). This service is available on data package landing pages under the "Code Generation" section. * EDIutils - An R package for accessing data in the EDI repository.


  1. Data's shameful neglect. Nature 461, 145 (2009). https://doi.org/10.1038/461145a
  2. Duke, C.S., Porter, J.H., 2013. The ethics of data sharing and reuse in biology. BioScience 63, 483-489. https://doi.org/10.1525/bio.2013.63.6.10
  3. Funk, F, Hefferon, M, Kennedy, B, and Johnson C. 2019. Trust and Mistrust in Americans' Views of Scientific Experts: More Americans have confidence in scientists, but there are political divides over the role of scientific experts in policy issues. https://www.pewresearch.org/science/2019/08/02/trust-and-mistrust-in-americans-views-of-scientific-experts/
  4. Hampton, S.E., Strasser, C.A., Tewksbury, J.J., Gram, W.K., Budden, A.E., Batcheller, A.L., Duke, C.S. and Porter, J.H. (2013), Big data and the future of ecology. Frontiers in Ecology and the Environment, 11: 156-162. https://doi.org/10.1890/120103
  5. Magnuson, J. J. (1990). Long-Term Ecological Research and the Invisible Present. BioScience, 40(7), 495–501. https://doi.org/10.2307/1311317
  6. Nelson, B. Data sharing: Empty archives. Nature 461, 160–163 (2009). https://doi.org/10.1038/461160a
  7. Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff, D., Green, D. P., Hesse, B., Humphreys, M., Ishiyama, J., … Yarkoni, T. (2015). SCIENTIFIC STANDARDS. Promoting an open research culture. Science (New York, N.Y.), 348(6242), 1422–1425. https://doi.org/10.1126/science.aab2374
  8. NSF: "Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing." https://www.nsf.gov/pubs/policydocs/pappg19_1/pappg_11.jsp#XID4
  9. Parsons, M. A., R. Duerr, and J.-B. Minster (2010), Data citation and peer review, Eos Trans. AGU, 91(34), 297–298, https://doi.org/10.1029/2010EO340001
  10. Piwowar, H. A., & Chapman, W. W. (2010). Public sharing of research datasets: a pilot study of associations. Journal of informetrics, 4(2), 148–156. https://doi.org/10.1016/j.joi.2009.11.010
  11. Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. https://doi.org/10.1371/journal.pone.0000308.
  12. Ryan Raub. 2013. Data Integrity, The Phrase I Don't Hear Enough. https://lternet.edu/wp-content/uploads/2017/12/2013-fall-lter-databits.pdf
  13. Whitlock, M.C., McPeek, M.A., Rausher, M.D., Rieseberg, L., Moore, A.J., 2010. Data archiving. The American Naturalist 175, 145-146. https://doi.org/10.1086/650340