EDI Services

Information rich metadata

We emphasize the value of information-rich metadata for human and machine understanding. Our metadata editing tools minimize the required effort while maximizing descriptive content by using algorithms to extract as much information from data as possible automatically. This allows you to focus on communicating the other important aspects of data that cannot be inferred. Our quality checker helps ensure that metadata accurately describes the data.

Quality data for reuse

Our open-access data repository provides valuable publication-quality data for future scientific inquiries. We hold a thematically diverse collection of data with temporal extents ranging from days to decades, and a global spatial extent though mostly within the United States. Published data can be revised and updated while previous versions are still available.

Tools to support research

We support research with tools to revise a dataset, create a personalized data catalog, report to funders, find data, explore data, access data, or inject automation into scientific workflows. If you have other needs, tell us, and we will work to incorporate it into our offerings.

Secure & persistent archive

We assign Digital Object Identifiers (DOIs) and guarantee immutability for all data for long-term access, transparency, and reuse. The EDI data repository satisfies DataCite standards for accurate and consistent identification of digital resources for citation and retrieval purposes and is a CoreTrustSeal certified repository. We participate in the development of data standards (e.g. FAIR, TRUST, CARE) and adopt their recommendations.

Timely one-on-one support

Science doesn't operate on banking hours and neither do we. Our team can be reached via email, Slack, or Zoom to address any questions or issues. We offer support and advise on a range of topics from data curation to software design. If we don't have an answer, we can refer you to someone who does.

Streamlined citation & attribution

All data publications are first class research objects which when cited will be fully attributed to personal research profiles. We work with ORCID, ROR, DataCite, and Crossref to ensure data packages are linked to journal manuscripts where possible and apply schema.org recommendations to our metadata for automatic update of Google Scholar and ORCID profiles.

Enhanced discovery

Data in the EDI Repository are findable in the EDI Data Portal, DataONE, and Google Dataset Search. We index metadata features commonly found in user searches for our advanced search interface. As a DataONE member our data are discoverable alongside other repositories in the DataONE Network. And finally, we mark up data package landing pages with schema.org metadata enabling an additional path of discovery by users of the Google Dataset Search.

Data exploration & analysis

Three data exploration tools provide a quick interactive view into a dataset and thereby deepening understanding of a data package beyond the metadata alone. First, our data exploration tool DeX is an interface for exploring and subsetting tabular data directly from our data portal. Second, the datapie R package provides a similar interface but broadens access to any data published in the DataONE network as well as to data stored on a local computer. Finally, Data download scripts are automatically generated for each data package in common languages (MatLab, Python, R, SAS, SPSS, tidyr), providing immediate access to manipulation and analysis.

Analysis ready data

Original data varies greatly in terms of format and structure, thereby making data reformatting a large component of the synthesis workflow. Our thematic standardization process accelerates integration and synthesis of data within our repository and across collaborating repositories. Two projects, ecocomDP and hymetDP, are now underway and involve collaborations with LTER sites, NEON, CUAHSI, and GBIF.

Workflow automation

Automating workflows have a larger impact with less effort. Our repository REST API, EDIutils R package, and data download scripts provides programmatic access to data and services for automation of common data curation tasks, reporting to stakeholders, and enables automation of fully open and reproducible science analysis workflows.

Training & skill building

Our information management resources are based on 40+ years of information management expertise, including from the U.S. LTER Network, and contributes to improving data management across the environmental science community. We have begun migrating our training and skill-building to an entirely online and self-learning format to broaden community access while retaining the one-on-one support that makes a lasting impact.

Personalized data catalogs

Personalized data catalogs are simple to create and maintain. This approach leverages the archived data and metadata in our repository to create a searchable index within a personal or project's website, thus reducing overhead while facilitating customized branding and an additional avenue of discovery.

Linked data & AI applications (Coming Soon)

Semantic annotation enables linked data for better human understanding, machine actionability, and the future of AI-driven scientific applications. Data packages are now being annotated with semantic markup by our data curators and planning for integration of search and use technologies with our repository is under development.

Computational environments (Coming Soon)

Connecting EDI data to computational environments will decrease compute times and enable effective collaboration within and across research groups. We are working with CyVerse to bring their services to EDI users and are in the process of designing a plan to integrate Jupyter Notebooks and Binder with hosted data.