News

We provide regular updates on new EDI features, services, and opportunities. Below are items from our bi-monthly newsletter (subscribe to our newsletter). We also use Twitter to highlight and engage with the EDI community and beyond (@EDIgotdata).

Webinar "EDI Summer Fellows present", 16 August 2022

August 11, 2022

Join us for 5-minute eLightning talks by our Summer Fellows that highlight their data publishing experience for specific host sites, spanning diverse ecosystems across the country, from Maine’s Mount Desert Island to Puerto Rico’s Luquillo Mountains to the Palmyra Atoll in the central Pacific.

EDI Data Repository scheduled maintenance

August 9, 2022

The EDI Data Repository will undergo scheduled maintenance Wednesday, 17 August and Thursday, 18 August that will result in our systems being unavailable. The University of New Mexico Center for Advanced Research Computing, where our production infrastructure is managed, has deemed this maintenance critical and necessary to continue with uninterrupted service into the future. We will do our best to maintain access to data in a "read-only" state (no data package uploads, no data package evaluations, and no archive downloads during this period). We cannot guarantee seamless access to data until we are back on our production systems on Friday, 19 August. We plan to perform regular patching on Tuesday, 16 August evening and transition into a "read-only" state by Wednesday morning. If all goes as expected, EDI will be back to using our production infrastructure by Friday morning. We will keep you posted on updates and changes to this schedule as we learn more. We apologize for any inconvenience this event will cause.

DeX in production

July 6, 2022

Our new data exploration tool called “DeX” for use with published data is in production now. It provides a text and graphical summary of each variable/column in a data table within a dataset, a data sub-setting tool, and simple graphing capabilities. To access the tool, go to https://edirepository.org “Find Data” and “EDI Dataset Search.” Once you have located a data package, hitting the “Explore Data” button next to each of the data tables will show you the profile/summary for that table. The menu on the top of the page provides access to the subset and plot functions. For more information see also "A Quick Overview of EDI’s Data Explorer (DeX)".

New version of Data Package Audit Report generator released

July 5, 2022

The EDI software development team has released a new version of the Data Package Audit Report generator that includes a function to download results as a Comma-Separated Values (CSV) file. The downloaded file contains more detailed records than what is displayed in the website view. The CSV file download begins immediately instead of being delayed while the complete result set is generated on the server. This change helps prevent timeouts from the server if the result set is large and takes excessive time to create. Download times may still be on the order of minutes, but we are confident that the CSV file will be complete. However, one change to be aware of is that the set of audit records in the CSV file is no longer guaranteed to be ordered by date and time. This modification to the search query now ensures that the download stream will begin quickly - a key factor in keeping the download connection active and viable. On another note regarding the audit records, erroneous or misleading records that were generated by known web-crawler "robots" will no longer be included in the audit report display or download file. We will, however, continue to record such events until we decide whether all "robot" records should be removed from the EDI audit system.

A Proposed Enhancement to Data Package Provenance

June 28, 2022

Create more descriptive and machine-actionable provenance linkages between data packages by leveraging the EML schema's support for semantic annotations in conjunction with the currently used methods for documenting provenance.

Workshop at ESA2022: Explore and work with biodiversity data from LTER and NEON

June 13, 2022

Join EDI and NEON staff for a workshop at the ESA/CSEE 2022 meeting: Explore and work with harmonized continental-scale biodiversity data from the US LTER and NEON

EDIutils is on CRAN

June 3, 2022

We are happy to announce the EDIutils R package (an API Client for the Environmental Data Initiative Repository) is now on CRAN. Install from your R Console with install.packages("EDIutils").

New Resource for Information Managers: Adding Physical Metadata

June 3, 2022

Physical metadata such as file size, MD5 checksum, and number of rows in a table, are important pieces of information for verifying the integrity of files after uploads and downloads. When a resource is uploaded to ezEML or processed by EMLassemblyline, this information is automatically calculated. However, neither application can obtain this information from a file that is not accessible. The responsibility is instead placed on the data provider to determine and manually enter this physical metadata.

Audit Service Regression Causes EDI Data Portal System Errors

May 18, 2022

Starting on Tuesday 10 May 2022, the PASTA Audit Service began experiencing unexpected system shut downs that resulted in user errors when attempting to access data packages from the EDI Data Portal. These events required the Audit Service to be restarted multiples times during the remainder of the week and through the weekend. The cause of these shutdowns was determined to be the introduction of a new software feature into the Audit Services, which created a very large amount of XML data to be generated. This resulted in the Audit Service running out of system memory and ultimately the loss of the service. EDI software developers diagnosed the issue on Monday 16 May and deployed a patch on Monday evening. Unfortunately, regression testing did not catch this problem because of differences in the volume of the Audit Service database between our production system and development system. We regret any inconvenience this may have caused and thank all of our users for their patience in this matter.

2022 Data Management Fellowship Program

May 11, 2022

We have awarded 15 fellowships for our ecological data management training program for this summer. The Fellows will receive training in ecological data management and gain hands-on experience through participation in data preparation and publishing with scientists and information managers from specific host research projects. See below for a list of host projects and mentors.

New EDI Website

May 4, 2022

We are excited to announce the release of our new website at https://edirepository.org. This website comes with improved support for EDI users, along with an abundance of documentation for the three pillars of our community:

Webinar "EDI reporting tools", 9 May 2022 @ 3:00 pm ET

May 3, 2022

This month’s webinar will be presented at the LTER Information Managers' monthly Water Cooler event.

EDIutils has moved to rOpenSci

April 23, 2022

We are happy to announce the EDIutils R package (an API Client for the Environmental Data Initiative Repository) has been accepted by rOpenSci. As a result, the project GitHub has moved and installation has changed to remotes::install_github("ropensci/EDIutils"). We will be submitting to CRAN in the coming weeks. Stay tuned!

Data Repositories Enriching the Global Research Infrastructure

April 20, 2022

Domain repositories have long provided a suite of services to the communities built around them including metadata management and data discovery tools, data access and preservation, and identification of resources created by and of interest to the community. As a result, domain repositories are central to important and thriving research communities. Recently a global research infrastructure for identifying and tracking many kinds of research objects has emerged. This includes Crossref, originally for journal articles and books, DataCite, originally for datasets but expanding to other research objects, as well as ORCID for identification of people and ROR for organizations. Crossref and DataCite initially focused on the creation of research object identifiers (DOIs). As researchers start to use these identifiers, it is clear that the connections enabled by these identifiers add considerable value. In order to enable these connections the growing, global research infrastructure needs content beyond minimal identification and citation metadata. Domain repositories are uniquely situated, with the deep knowledge of their communities, to extend their services by providing connections and content to deliver additional metadata to enrich the global research infrastructure. This talk by Ted Habermann, of Metadata Game Changers, shares several recent examples that demonstrate the power of enriching the global research infrastructure.

ezEML Templates

March 10, 2022

Research sites (e.g., LTER sites) and teams of researchers who are using ezEML to capture metadata may find that certain content is used repeatedly across a number of documents. Examples of such repeated content can include Creators, Contacts, Keywords, Intellectual Rights, Geographic Coverage, Project, etc. Users can avoid the tedious task of re-entering this information for each new dataset by creating and publishing one or more “templates” that are prepopulated with this standard content. Since templates exist outside of any individual user’s ezEML account, they are accessible to everyone. Everyone who uses a template will get the current version, which helps alleviate problems arising from different versions residing in different users’ accounts.

A Quick Overview of EDI’s Data Explorer (DeX)

January 28, 2022

The EDI software team is excited to announce DeX, a tool for exploring and subsetting tabular data, which is now in beta testing on the EDI staging Data Portal (https://portal-s.edirepository.org/nis). DeX provides three views into tabular data found in the EDI Data Repository: 1) a statistical profiler that analyzes the data table and displays detailed information about each attribute; 2) a filter and subsetting application that allows you to download the subsetted data, along with a new EML metadata document describing the subset; and 3) a simple-to-use scatter and line plotting application that gives you a visual glimpse into data trends. DeX is currently available on either of our development or staging Data Portals and works with CSV-based data tables (soon to work with a wider set of tabular formats). To see DeX in action, look for a data package in the staging Data Portal containing a CSV data file and click on the “Data Explorer – experimental” link at the end of the data entity record information (see below):

EDIutils R Package Update

January 26, 2022

EDIutils is a client for the Environmental Data Initiative repository REST API and includes functions to search and access existing data, evaluate and upload new data, and assist with related data management tasks (https://github.com/EDIorg/EDIutils).

Normalization of Creator Names in EDI’s Data Portal

November 2, 2021

The Advanced Search feature of EDI’s Data Portal lets you select a dataset Creator name from a drop-down list of all dataset creators in our repository. The search then displays all the datasets that have that name as one of its creators.

Updates to User-contributed Journal Citation Interface on the EDI Data Portal

November 2, 2021

The Environmental Data Initiative (EDI) has recently updated its user-contributed journal citation interface on its Data Portal to include more granular information regarding the type of citation being submitted. The addition of the Relation Type form field allows you to select the relationship between the data package and the journal manuscript where the data package is mentioned using one of three relationship types: “IsCitedBy” – this data package is formally cited in the manuscript, “IsDescribedBy” – this data package is explicitly described within the manuscript, or “IsReferencedBy” – this data package is implicitly described within the manuscript. This information is conveyed to DataCite through an update of the Digital Object Identifier (DOI) metadata and provides greater exposure to the data package through DataCite’s event data and CrossRef, an official DOI registrar of the International DOI Foundation for academic journals. The EDI Data Portal allows any user with an EDI provisioned account to add a journal citation to any data package, regardless of data package ownership, thereby greatly increasing related information about the data package – a win-win for the entire community!

Harmonizing Ecological Community Survey Data for Reuse: An Update

September 6, 2021

The idea of harmonizing data is not new, and for some research domains has been successful. Our body of long-term observations of organisms in ecological communities is growing, and many datasets have been used already in synthesis and meta analyses – but only after considerable effort to bring them into alignment. A goal of EDI has been to develop recommendations for data harmonization, and to convert “raw data” in specific domains into a common data model to prepare them for analysis and accelerate synthesis or meta analyses.

Integrating Long-Tail Data: How Far Are We?

September 5, 2021

EDI’s Kristin Vanderbilt and Corinna Gries co-edited a Special Issue of Ecological Informatics “Integrating Long-Tail Data: How Far Are We?” that explores how far the informatics community has come toward lessening the time researchers must spend integrating small, heterogeneous datasets prior to analyzing them.

Updating schema.org Metadata for Data Packages in the EDI Data Portal to Provide Rich Semantic Information That can be Utilized by Search Engines and Google Scholar

May 3, 2021

The EDI technical team is now updating the schema.org metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project (https://github.com/ESIPFed/science-on-schema.org). EDI initially released schema.org metadata for each data package in Fall 2018. The dataset schema.org metadata is encoded as a JSON-LD data structure that is embedded within script tags on the data package metadata landing page. Along with the sitemaps.org metadata that acts as an SEO content table of index, the schema.org metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE schema.org indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface (https://bit.ly/3nDhT8j) because of the detailed information provided to Google’s search engine indexer via the schema.org metadata:

Rendering of Markdown and LaTex equations in EML

April 30, 2021

The EDI Data Portal now supports the provisional rendering of Markdown and LaTex equations in most TextType elements of the Ecological Metadata Language (e.g., “abstract”, “intellectualRights”, and the method step “description”). EDI recently updated these two features on the Data Portal’s Data Package Metadata web page through the use of “showdown.js” (https://showdownjs.com/) for Markdown and “MathJax.js” (https://www.mathjax.org/) for LaTex formatted math equations. Markdown provides a convenient way to add structural highlights to text elements, including the use of different heading styles, bold and italicized text, bulleted and numbered lists, and much more. The “showdown.js” package supports most of the commonly used GitHub flavored Markdown (https://github.github.com/gfm/) syntax and is processed by the client’s web browser. For example, the following snippet from the EDI Data Portal shows both a Markdown heading style and a bulleted list from a rendered EML document:

EDI Supports Temporary Data Embargoes Upon Request

March 1, 2021

You may not know that the Environmental Data Initiative provides an embargo service to temporarily block access to data tables (and other types of data) in your EDI uploaded data packages, but it does. We provide this service to satisfy requirements of many journals who request that data accompanying a manuscript be archived in a recognized data repository and assigned a valid Digital Object Identifier (DOI) before the manuscript is reviewed. It is often the case that the journal or manuscript author prefers that the data be off limits to the general public until the manuscript is fully published. For this reason, EDI will apply a temporary embargo on the data elements of your data package (metadata, however, are not permitted to be embargoed) at your request – all you have to do is ask through our support email address or directly on our Slack channel. We will remove the embargo once you let us know that the manuscript has been published. The best part is that the DOI remains the same before and after the embargo. Because EDI is a strong proponent of publicly accessible data, we will periodically reach out to owners of embargoed data to confirm the continued need of the embargo. Lastly, please let us know if and how we can improve this service.

Request for Applications to Host an EDI 2021 Summer Fellow

October 27, 2020

We are delighted to announce our fourth Data Management Fellowship Program for the Summer of 2021 and are now accepting applications from research projects and field stations interested in hosting a fellow.

Release of ezEML

September 17, 2020

We recently released ezEML, a form-based online application designed to streamline the creation of metadata in the Ecological Metadata Language (EML).

Journal Citations Associated with EDI Data Packages

May 19, 2020

The EDI technical team recently modified the view of user contributed journal citations associated with a specific version of a data package so that the citation is now displayed on all versions of the data package, not just the version for which the citation is relevant. This enhancement allows users who browse the data package metadata landing page to see all citations related to data package series regardless of what version of the data package they are viewing. User contributed journal citations provide an easy way for the author of a data paper (or others) to increase the impact factor of the data package by directly linking the published manuscript to the supporting data package found in the EDI data repository. In fact, older manuscripts that utilize an EDI data package may be added to the list of journal citations even if the data package DOI was not available at the time of publication. We are currently exploring options for updating the DataCite DOI metadata of the data package with these citations so that downstream services may take advantage of this crowd-sourced information.

The Summer Fellowship Program of the Environmental Data Initiative

March 25, 2020

The Environmental Data Initiative (EDI) assists researchers from field stations, individual laboratories, and research projects of all sizes to archive and publish their environmental data. EDI’s very successful Summer Fellowship Program for Data Management Training is one component of our Outreach and Training program. For the third consecutive year, EDI is reviewing applications from interested undergraduate and graduate students to become an EDI summer fellow. This year we are seeking nine fellows to be trained in the data publishing process and to support 9 research sites in their efforts to manage their data. EDI’s aim is to ensure that these young professionals learn state-of-the-art data stewardship practices.

Operations of the EDI Data Repository will Continue as Normal During this COVID-19 Health Crisis

March 25, 2020

Dear EDI user,

RFC: Annotating EML with the EMLassemblyline R Package

March 25, 2020

We’d like your feedback on a new EMLassemblyline feature that supports annotation of both new and existing EML documents.

Cite: A Lightweight Citation Service for Data Packages in the EDI Data Repository

February 12, 2020

The EDI technical team has released Cite, a lightweight web-service that generates citations for data packages archived in the EDI data repository. Cite is simple to use and requires only the EDI data package identifier appended to the end of the Cite URL: “https://cite.edirepository.org/cite/”. For example, the URL “https://cite.edirepository.org/cite/edi.460.1”, when entered into a web-browser query field, returns the following ESIP-stylized citation:

Google Scholar Highlights EDI Data Packages as First-order Citations in User Profiles and in Scholarly Articles

February 10, 2020

Data is becoming increasingly citable as first-order objects, including data archived in the EDI repository. One indication is that data package publications are indexed in personal Google Scholar user profiles, along with other scholarly articles, as for example in the profile of Paul Hanson (Research Professor at the Center for Limnology, University of Wisconsin-Madison).

EDI is 40th DataONE Member Node

April 18, 2017

The contents of the EDI Data Repository is now discoverable through DataONE. EDI became the 40th DataONE Member Node when it registered its version of the DataONE Generic Member Node (GMN) software stack to synchronize EDI data content through the DataONE Federation.