Licensing Data

Managing intellectual property rights is an important part of the EDI Data Repository data management process.

The basics of licensing

Datasets deposited in a repository can be thought of as having two levels, content and organization. The content level may include raw observational data, photographs, recordings of bird calls, or model code. Some of these data products, such as photographs whose creation involved discretionary decisions made by the author, may be copyrighted. This means they are subject to_ _a form of intellectual property law that protects original works of authorship so that the author has exclusive rights to reproduce, publicly display, and publicly disseminate their work. Most raw data and metadata are "facts", however, and facts cannot be copyrighted. Raw observational data have no copyright protections under U.S. law. Thus, the data objects in the content level of a dataset in the EDI repository may or may not be protected by copyright.

The second level of a dataset pertains to its organization and may offer a level of copyright protection. If the data depositor has created a novel database model, or organized, arranged, or annotated the data in a custom way that involved the depositor's expressive choice, then the material is subject to copyright. Copyright is owned initially by the author(s) of a copyrighted work. If, however, the copyrighted work is created by an employee in the course of their regular work, the employer may be treated as the author, at least in the U.S.

If intellectual property rights apply to a research dataset, a license or waiver can be used by the owner of such rights to grant permission for reuse of the data. A license is a legal instrument used by a rights holder to specify what a data user may do with a dataset without infringing on the rights held. Licenses usually grant permissions on condition that certain terms are met. A waiver is a legal instrument by which the rights holder gives up rights to a resource, so that infringement is no longer an issue. It can be difficult for a data user to determine what intellectual property rights apply to a dataset in a repository, so it is important for the data depositor to signal with a license or waiver the terms of data reuse.

Creative Commons is a non-profit global entity that gives organizations and people the ability to license or waive rights to their intellectual property for creative and academic works. Others may then make use of those works as specified by the license while also providing proper attribution.

Creative Commons offers six copyright licenses and one copyright waiver.

CC0: This waiver indicates that a dataset is free from copyright and dedicated to the public domain. CC0 allows reusers to distribute, adapt, and build upon the dataset with no restrictions.

CC-By: This is the most open license from the Creative Commons. It allows a dataset user to distribute, adapt, and build upon the dataset as long as the original creator receives attribution.

CC-By is preferred by many in academia because it encourages data users to give credit to the data creators, even if there is no copyrightable content in the dataset. CC0 is the better choice, however, because it avoids the problem of "attribute stacking" where a derivative work must acknowledge all contributors. A dataset that combines data from 100 other datasets would need to reference all those datasets if they all require acknowledgement, an untenable practice in some cases. Using CC-By rather than CC0 may place an unnecessary burden on downstream data users.

Primary data

If you publish raw observational or experimental data in EDI, you are encouraged to select the CC0 waiver. You may, however, use any other license that you wish, including one that you author yourself.

Secondary data

Licensing data that contains secondary or 'reused' data - As data reuse and the combining of new and secondary data becomes more common, reusers are needing to apply licenses to these 'derived' datasets. The type of license you choose for such a dataset will depend on (a) the amount or proportion of secondary data in the 'new' dataset, and (b) the terms of the original license under which those secondary data were acquired. For example, whether the original license required that new versions of the data be licensed under the same terms.


EDI does not enforce licenses. This is more in the domain of journals, which are beginning to do this.


Carroll MW (2015) Sharing Research Data and Intellectual Property Law: A Primer. PLoS Biol 13(8): e1002235.