EDI Policy

Data policy

Version 0.0, Adopted 25 November 2019

The Environmental Data Initiative (herein EDI) provides services defined as publication and archive of science data to communities world-wide. In accordance with the EDI scope and mission, the following Data Policy statement declares an agreed upon understanding between EDI and the individual or individuals responsible for any and all data submitted to the EDI data repository for the purposes of publication and archive.

Definitions

The following definitions are used throughout this Data Contributor Policy

EDI data contributor. An individual or individuals who are responsible for submitting a data package to the EDI data repository for the purpose of data publication and archive.

EDI customer. An individual or individuals who utilizes EDI's data publication and archive service.

EDI data repository. An Internet-based scientific data repository service for scientific data publication and archive.

EDI data publication and archive. The process by which scientific data and metadata are made discoverable and available through EDI computational infrastructure, including the long-term curation and management of such data.

EDI website. The official EDI Internet website (https://edirepository.org/) where general information about EDI, including policies, news, events, and featured scientific data may be accessed.

Science Data. Data collected by external parties that is published and archived by EDI.

Science Data Package. The aggregate product produced by combining science data with science metadata.

Science Metadata. Textual metadata describing scientific data that is published and archived by EDI

Data package accessibility

EDI strives to make environmental research data open and accessible to the general public without undue restrictions or barriers. Although EDI strongly recommends all data be publicly available, we recognize that some data may require limited access while it is under review during manuscript preparation. In these cases, the EDI Data Repository supports access control to data when justified by the data provider, thereby limiting exposure of the data resource to only users with appropriate permission 1. Such access control must be clearly specified in the data package metadata. EDI will also accept data that requires a permanent embargo due to issues of sensitivity (e.g., the location of endangered species or antiquities). Although EDI will enforce access control of data as specified in the data package metadata, EDI does not guarantee the privacy of such information. If data are to be submitted to the EDI Data Repository with restricted access, we request that an explanation of the data embargo, including if and when the data will be made available to the general public, be provided in the data entity description field of the data package metadata. Only for extreme circumstances will EDI allow both the metadata and data to be restricted 2. EDI reserves the right to periodically review restricted data to determine if embargos continue to be justified.

Sensitive data

Under no circumstance will EDI knowingly accept data that is protected by Federal, State, or local laws and policies (e.g., FERPA, HIPAA, or IRB restrictions on human subject data). In addition, science metadata often contains personal data of individuals involved in scientific research. These personal data may be available to other EDI customers and the general public through an EDI website. EDI requires that the individual(s) responsible for submitting science data packages to EDI acknowledge that such science data and metadata is not restricted by any governing laws and policies or that personal data within science metadata is included with the explicit knowledge and permission of the individual or individuals it affects.

For archiving and working with sensitive data see:

  • ICPSR - An international consortium of more than 750 academic institutions and research organizations, Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community.
  • Qualitative Data Repository - A dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.
  • Databrary - Supports data sharing among researchers in the behavioral, social, educational, developmental, neural, and computer sciences.
  • OpenDP - A community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data.

Offline data

There is an option to declare data "offline" within the distribution field of the data package metadata. Doing so eliminates the need to have data be "network" accessible to the EDI Data Repository during the data package upload process. The use of offline data is only recommended for data that are too voluminous for network or storage capacity, such as the output from numerical models. Offline data must be provided to the EDI Data Repository through alternative means (e.g., SSD drive via a common carrier) prior to uploading the data package metadata. EDI requests that an explanation of the offline status be provided in the data entity description field of the data package metadata, including the preferred method of data distribution if required to satisfy a data request. See here for details.

Intellectual rights of the data contributor

EDI makes every effort to ensure that all data are curated with intellectual rights defined by the data provider as found in the data package metadata. Although EDI advocates for open and unfettered access to data packages without use restrictions, we do not forbid data providers from declaring more restrictive licensing agreements for use of their data packages. Data providers should include a statement of Intellectual Rights in the metadata of their submissions. If they do not, EDI reserves the right to add a default declaration of intellectual rights to the data package metadata. The default declaration of intellectual rights used by EDI is based on the Creative Commons CC0 "No Rights Reserved" waiver. See below for the full default statement:

This data package is released to the "public domain" under Creative Commons CC0 1.0 "No Rights Reserved". It is considered professional etiquette to provide attribution of the original work if this data package is shared in whole or by individual components. A generic citation is provided for this data package on the website (herein "website") in the summary metadata page. Communication (and collaboration) with the creators of this data package is recommended to prevent duplicate research or publication. This data package (and its components) is made available "as is" and with no warranty of accuracy or fitness for use. The creators of this data package and the website shall not be liable for any damages resulting from misinterpretation or misuse of the data package or its components. Periodic updates of this data package may be available from the website. Thank you.

Privacy policy

Version 2.0, Adopted 10 October 2019; Updated 27 February 2024

The Environmental Data Initiative (herein EDI) publishes this Privacy Policy to inform you, our customers, of the collection, use, and disclosure ("Processing") of personal data by the EDI project, its computational infrastructure, and scientific partners during the operation of data publication and archive (collectively, our "services"). This Privacy Policy is effective 2 October 2019, and may be amended in the future.

Definitions

The following definitions are used throughout this Privacy Policy:

Authenticated customer. A customer whose identity has been verified through a means of challenge, such as providing a password or other item of information that only the customer would know.

EDI authentication token. A custom web browser cookie that contains authentication information about the customer to enable PASTA+ software access control mechanisms.

EDI customer. An individual or organization that utilizes one or more services provided by EDI.

EDI data repository. An Internet-based scientific data repository service for scientific data publication and archive.

EDI data publication and archive. The process by which scientific data and metadata are made discoverable and available through EDI computational infrastructure, including the long-term curation and management of such data.

EDI website. The official EDI Internet website where general information about EDI, including policies, news, events, and featured scientific data may be accessed.

EDI workshop. An organized and scheduled effort by EDI (or scientific partner) personnel to disseminate educational materials related to scientific data publication and archive to EDI customers.

Identity provider (IdP). A registered service that performs customer identity verification and authentication. In some cases, an IdP may provide information (e.g., email address) about the customer in addition to performing identity verification.

PASTA+ software. The software developed, maintained, and used by EDI to provide its data publication and archive service.

Personal data. Data relating to an identified or identifiable natural person, which may include: common name, surname, given name, email address, organizational associations (name, address, phone), and or unique identifier (such as ORCID or GitHub identity).

Science Data. Data collected by external parties that is published and archived by EDI.

Science Metadata. Textual metadata describing scientific data that is published and archived by EDI

Why and how we collect personal data

  1. Dissemination of EDI news and updates. EDI sends news items and updates about our project, operation, and services to EDI customers who subscribe to such information. Customers must actively submit personal data, including: email (required), surname (optional), given name (optional), organization (optional), and organizational role (optional), to EDI's MailChimp account. These personal data are not shared with any 3rd party or partner.
  2. Customer identity information for authorization to EDI data repository services and scientific data and metadata. EDI restricts access to some data repository services (e.g., publishing and archiving scientific data) to a subset of customers who have agreed to our data publication policy. In addition, customers who contribute science data and metadata have the option to apply access control to their data and metadata to limit distribution of their products. Customers who identify through an EDI accepted authentication protocol can be filtered against one or more rules used to allow or deny access to EDI data repository services or scientific data and metadata. Customers who require the ability to publish and archive science data and metadata must request an EDI LDAP account through an EDI representative. An EDI LDAP account requires a unique customer identifier composed into an LDAP distinguished name, given name, surname, and valid email address. Customers who only require identification to access controlled science data or metadata or use EDI's ezEML metadata editor may use a third-party identity provider (IdP) service to verify their identity. See below for information about the third-party identity providers EDI uses and the personal data they release to EDI.
  3. Customer email or other contact information. EDI customers may register contact information with EDI for the purpose of notification when the creation, addition, or modification of science data and metadata that is curated by EDI occurs within the EDI data repository. Notifications of this type serve to inform customers when new or updated science data are added to the system or to alert customers when science data are found to be suspect or erroneous post-publication. The collection of customer contact information is an option provided to EDI customers during an authenticated web browser session. Customer contact information includes only an email address.
  4. Web browser session cookies and authentication tokens. EDI websites utilize web browser session cookies and authentication tokens to maintain an authenticated state between the customer's web browser and EDI's website services. Session cookies are generated by the EDI website and authentication tokens are generated by the EDI authentication service at the point a customer self-identifies. EDI authentication tokens include the customer's unique identifier, a token time-to-live, and any membership in recognized roles or groups.

Third-party identity provider customer information

The Environmental Data Initiative utilizes third-party identity providers, GitHub, Google, Microsoft, and ORCID, to authenticate and uniquely identify customers who (1) require access to authentication controlled science data and metadata; (2) use EDI’s “ezEML” metadata editor web application, which requires a unique customer identifier for retaining application history; or (3) wish to create a unique customer profile within the suite of EDI web applications. This form of identity authentication relies on the OAuth2.0/OpenId Connect protocols used to communicate between your client browser, EDI, and the identity provider (IdP). EDI does not store customer authentication (“sign-on”) credentials on any EDI host server. However, in addition to securely verifying your identity, the IdP allows EDI access to minimal information they maintain about you: a string value that uniquely identifies you within their system (e.g., an email address) and your common name (if available). EDI may store this information in a web-based session cookie for customer identification on websites, in database applications maintained by EDI that are related to customer profiles or to match customer interactions with an EDI data product, or within an EDI authentication token that is used to convey customer identity information to one of EDI’s web services that implements access control. By selecting authentication through a third-party IdP, you consent to releasing this information for the above purposes. The following sections list detailed information that each IdP releases to EDI:

GitHub

The information released about you from GitHub includes:

  1. the GitHub URL used to identify your personal GitHub repository,
  2. your given name, and
  3. your surname.

Google

The information released about you from Google includes:

  1. the email address provided to Google when you signed up for Google services,
  2. your given name, and
  3. your surname.

Microsoft

The information released about you from Microsoft includes:

  1. the email address provided to Microsoft when you signed up for Microsoft services,
  2. your given name, and
  3. your surname.

ORCID

The information released about you from ORCID includes:

  1. your fully qualified ORCID identifier,
  2. your given name, and
  3. your surname.

Security of collected personal data

All collected personal data are transmitted using HTTP SSL encryption when on the open Internet and restricted behind EDI system firewalls when operated on within the EDI data repository service oriented architecture.

Personal data found within science metadata

Personal data may be found within science metadata in the form of contact information pertaining to the origin of the science data and metadata. EDI does not actively collect such personal data; such personal data is provided by EDI customers who wish to publish and archive science data and metadata. EDI does require customers to acknowledge that the owners of this personal data have agreed to its release as part of the publication process. This type of personal data (i.e., contact information) is critical for consumers of science data and metadata to better determine the nature and origin of the science data and metadata when ascertaining fitness for use. In addition, science metadata may contain customer unique identifiers to enable the processing of access control.

Transparency and sharing of personal data

EDI records customer identity information, if available, within EDI's activity audit to better understand what and when published science data and metadata are accessed within the EDI data repository. This information is coupled with the date and time of access and the science data or metadata that is accessed. This information may be summarized and provided to our funding agencies to justify continued operations. In addition, EDI may share the same detailed audit information with customers who contribute science data and metadata so that they may better understand the reuse and efficacy of their science data and metadata publication.

Personal data retention, access and removal of personal data

EDI retains the aforementioned personal data within EDI's computational infrastructure for an indefinite period of time. Upon written request to support@edirepository.org and with proper identification, EDI will provide the requested with a report of all recorded instances of personal data in digital format and or remove all instances of personal data.

Language Internationalization Policy

Introduction

At the Environmental Data Initiative, we are committed to advancing environmental research and facilitating global collaboration through our data repository. To ensure the broadest accessibility and usability of the data we curate, this policy outlines our approach to internationalizing language within the repository.

Base Language and Metadata

Our recommended data curation practice entails publishing datasets within our repository accompanied by metadata written in U.S. English. By establishing a standardized language for metadata, we enhance the discoverability and accessibility of environmental data for users worldwide. While we accept data in any language, we do not have the resources to be fully multilingual. English as a base language enables efficient indexing while still allowing cross-lingual search (see below).

Multilingual Access

Full understanding of data requires accessibility in multiple languages. To bridge language barriers we recommend widely available translation tools, such as Google Chrome Translate (or an extension for your browser). Users seeking to interact with the repository in their preferred language can follow these steps:

  1. Browser Settings: Configure the browser to interpret content from English to the user's preferred language. This step ensures that the interface elements are displayed in the desired language.

  2. Search Interface: Access the repository's search interface while the browser is set to the preferred language. This will enable users to navigate the interface comfortably.

  3. Translation of Queries: Utilize a Google translation service to convert search queries from the user's preferred language into English. This step allows for effective communication with the repository's search engine.

  4. Executing Searches: Paste the translated English query into the repository's search engine and initiate the search. The results returned will be displayed in the user's preferred language.

These practices empower researchers and stakeholders from anywhere in the world to access and engage with valuable data resources in their preferred languages.

Flexibility and Ongoing Improvement

We acknowledge that language preferences and technology evolve over time. We also acknowledge that translation services don’t cover all languages and remain open to refining our language internationalization approach to accommodate specific and/or changing user needs as well as advancements in language technology. As part of our commitment to excellence, we will continually assess the effectiveness of our practices, explore opportunities to further enhance language accessibility within the repository, and welcome user feedback.

Conclusion

At the Environmental Data Initiative, our environmental data repository stands as a gateway to global environmental insights. By upholding our language internationalization policy, we ensure that language barriers do not hinder the exchange of knowledge and collaboration across borders. We invite all users to embrace these practices and contribute to our shared mission of addressing environmental challenges on a global scale.

Code of conduct

The Environmental Data Initiative (EDI) is an NSF funded project helping to accelerate the curation and archive of environmental data. We operate and maintain a reliable, registered, and certified trustworthy data repository for ecological research data. EDI provides training on the data archiving process as well as data management best practices through events such as webinars, workshops and fellowships as well as individual support by EDI's information managers.

EDI is committed to providing a safe, productive and welcoming environment for all participants and staff while at any EDI sponsored event or venue.

All participants (including, but not limited to, attendees, speakers, instructors, fellows and their hosts, volunteers, contractors, EDI staff and guests) are expected to abide by this Code of Conduct. This Code of Conduct applies in all venues, including ancillary events and social gatherings, whether officially sponsored by EDI or not.

Expected behavior

  • Treat all participants, attendees, fellows, staff, and vendors with kindness, respect and consideration, valuing a diversity of views and opinions (including those you may not share).
  • Communicate openly with respect for other participants, critiquing ideas rather than individuals.
  • Refrain from demeaning, discriminatory, or harassing behavior and speech directed toward others, whether in person, in print, or online.
  • Be mindful of your surroundings and of your fellow participants.
  • Respect the rules and policies of the meeting venue.
  • Abide by principles of academic integrity and ethical professional conduct.

Unacceptable behavior

  • Harassment, intimidation or discrimination in any form is unacceptable. Harassment includes speech or behavior that is not welcome or is personally offensive. Behavior that is acceptable to one person may not be acceptable to another, so use discretion to be sure respect is communicated.
    • Verbal harassment includes comments, epithets, slurs, threats, and negative stereotyping that are offensive, hostile, disrespectful, or unwelcome.
    • Non-verbal harassment includes actions or distribution, display, or discussion of any written or graphic material that ridicules, denigrates, insults, belittles, or shows hostility, aversion, or disrespect toward a group or individual. The use of sexual and/or discriminatory images in public spaces or in presentations is also considered harassment.
  • Examples of unacceptable behavior include—but are not limited to—unwelcome or offensive verbal comments related to age, appearance or body size, employment or military status, ethnicity, gender identity and expression, individual lifestyle, marital status, national origin, physical or cognitive ability, political affiliation, sexual orientation, race, or religion.
  • Retaliation and reporting an incident in bad faith both undermine the safe, productive and welcoming environment we are striving to create and will also be subject to consequences.

Consequences

EDI reserves the right to enforce this Code of Conduct in any manner deemed appropriate. Except in the most egregious cases, anyone violating the Code of Conduct will first be asked to cease these behaviors. Failure to comply with requests can result in escalating consequences which may include expulsion from the event or prohibition from future events.

Reporting

If you are the subject of unacceptable behavior or have witnessed any such behavior, please immediately notify a member of the EDI staff, preferably one of the principal investigators, Corinna Gries or Mark Servilla.

Anyone experiencing or witnessing behavior that constitutes an immediate or serious threat to individual or public safety at any of the events organized by EDI is advised to contact venue security or local law enforcement.


  1. Access to science metadata and data must be granted explicitly within the data package metadata (access is denied by default). Data packages containing restricted data (i.e., not publicly accessible) will not be shared with DataONE. [back]
  2. The restriction of an entire data package (both metadata and data) should be arranged with EDI prior to submitting the data package to the data repository. Data packages that do not allow public access to both science metadata and data will not receive a Digital Object Identifier. [back]