Audit Service Regression Causes EDI Data Portal System Errors
May 18, 2022
Starting on Tuesday 10 May 2022, the PASTA Audit Service began experiencing unexpected system shut downs that resulted in user errors when attempting to access data packages from the EDI Data Portal. These events required the Audit Service to be restarted multiples times during the remainder of the week and through the weekend. The cause of these shutdowns was determined to be the introduction of a new software feature into the Audit Services, which created a very large amount of XML data to be generated. This resulted in the Audit Service running out of system memory and ultimately the loss of the service. EDI software developers diagnosed the issue on Monday 16 May and deployed a patch on Monday evening. Unfortunately, regression testing did not catch this problem because of differences in the volume of the Audit Service database between our production system and development system. We regret any inconvenience this may have caused and thank all of our users for their patience in this matter.
2022 Data Management Fellowship Program
May 11, 2022
We have awarded 15 fellowships for our ecological data management training program for this summer. The Fellows will receive training in ecological data management and gain hands-on experience through participation in data preparation and publishing with scientists and information managers from specific host research projects. See below for a list of host projects and mentors.
EDIutils has moved to rOpenSci
April 23, 2022
We are happy to announce the EDIutils R package (an API Client for the Environmental Data Initiative Repository) has been accepted by rOpenSci. As a result, the project GitHub has moved and installation has changed to remotes::install_github("ropensci/EDIutils"). We will be submitting to CRAN in the coming weeks. Stay tuned!
Data Repositories Enriching the Global Research Infrastructure
April 20, 2022
Domain repositories have long provided a suite of services to the communities built around them including metadata management and data discovery tools, data access and preservation, and identification of resources created by and of interest to the community. As a result, domain repositories are central to important and thriving research communities. Recently a global research infrastructure for identifying and tracking many kinds of research objects has emerged. This includes Crossref, originally for journal articles and books, DataCite, originally for datasets but expanding to other research objects, as well as ORCID for identification of people and ROR for organizations. Crossref and DataCite initially focused on the creation of research object identifiers (DOIs). As researchers start to use these identifiers, it is clear that the connections enabled by these identifiers add considerable value. In order to enable these connections the growing, global research infrastructure needs content beyond minimal identification and citation metadata. Domain repositories are uniquely situated, with the deep knowledge of their communities, to extend their services by providing connections and content to deliver additional metadata to enrich the global research infrastructure. This talk by Ted Habermann, of Metadata Game Changers, shares several recent examples that demonstrate the power of enriching the global research infrastructure.
March 10, 2022
Research sites (e.g., LTER sites) and teams of researchers who are using ezEML to capture metadata may find that certain content is used repeatedly across a number of documents. Examples of such repeated content can include Creators, Contacts, Keywords, Intellectual Rights, Geographic Coverage, Project, etc. Users can avoid the tedious task of re-entering this information for each new dataset by creating and publishing one or more “templates” that are prepopulated with this standard content. Since templates exist outside of any individual user’s ezEML account, they are accessible to everyone. Everyone who uses a template will get the current version, which helps alleviate problems arising from different versions residing in different users’ accounts.
A Quick Overview of EDI’s Data Explorer (DeX)
January 28, 2022
The EDI software team is excited to announce DeX, a tool for exploring and subsetting tabular data, which is now in beta testing on the EDI staging Data Portal (https://portal-s.edirepository.org/nis). DeX provides three views into tabular data found in the EDI Data Repository: 1) a statistical profiler that analyzes the data table and displays detailed information about each attribute; 2) a filter and subsetting application that allows you to download the subsetted data, along with a new EML metadata document describing the subset; and 3) a simple-to-use scatter and line plotting application that gives you a visual glimpse into data trends. DeX is currently available on either of our development or staging Data Portals and works with CSV-based data tables (soon to work with a wider set of tabular formats). To see DeX in action, look for a data package in the staging Data Portal containing a CSV data file and click on the “Data Explorer – experimental” link at the end of the data entity record information (see below):
Normalization of Creator Names in EDI’s Data Portal
November 2, 2021
The Advanced Search feature of EDI’s Data Portal lets you select a dataset Creator name from a drop-down list of all dataset creators in our repository. The search then displays all the datasets that have that name as one of its creators.
Updates to User-contributed Journal Citation Interface on the EDI Data Portal
November 2, 2021
The Environmental Data Initiative (EDI) has recently updated its user-contributed journal citation interface on its Data Portal to include more granular information regarding the type of citation being submitted. The addition of the Relation Type form field allows you to select the relationship between the data package and the journal manuscript where the data package is mentioned using one of three relationship types: “IsCitedBy” – this data package is formally cited in the manuscript, “IsDescribedBy” – this data package is explicitly described within the manuscript, or “IsReferencedBy” – this data package is implicitly described within the manuscript. This information is conveyed to DataCite through an update of the Digital Object Identifier (DOI) metadata and provides greater exposure to the data package through DataCite’s event data and CrossRef, an official DOI registrar of the International DOI Foundation for academic journals. The EDI Data Portal allows any user with an EDI provisioned account to add a journal citation to any data package, regardless of data package ownership, thereby greatly increasing related information about the data package – a win-win for the entire community!
Harmonizing Ecological Community Survey Data for Reuse: An Update
September 6, 2021
The idea of harmonizing data is not new, and for some research domains has been successful. Our body of long-term observations of organisms in ecological communities is growing, and many datasets have been used already in synthesis and meta analyses – but only after considerable effort to bring them into alignment. A goal of EDI has been to develop recommendations for data harmonization, and to convert “raw data” in specific domains into a common data model to prepare them for analysis and accelerate synthesis or meta analyses.
Integrating Long-Tail Data: How Far Are We?
September 5, 2021
EDI’s Kristin Vanderbilt and Corinna Gries co-edited a Special Issue of Ecological Informatics “Integrating Long-Tail Data: How Far Are We?” that explores how far the informatics community has come toward lessening the time researchers must spend integrating small, heterogeneous datasets prior to analyzing them.
Updating schema.org Metadata for Data Packages in the EDI Data Portal to Provide Rich Semantic Information That can be Utilized by Search Engines and Google Scholar
May 3, 2021
The EDI technical team is now updating the schema.org metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project (https://github.com/ESIPFed/science-on-schema.org). EDI initially released schema.org metadata for each data package in Fall 2018. The dataset schema.org metadata is encoded as a JSON-LD data structure that is embedded within script tags on the data package metadata landing page. Along with the sitemaps.org metadata that acts as an SEO content table of index, the schema.org metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE schema.org indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface (https://bit.ly/3nDhT8j) because of the detailed information provided to Google’s search engine indexer via the schema.org metadata:
Rendering of Markdown and LaTex equations in EML
April 30, 2021
The EDI Data Portal now supports the provisional rendering of Markdown and LaTex equations in most TextType elements of the Ecological Metadata Language (e.g., “abstract”, “intellectualRights”, and the method step “description”). EDI recently updated these two features on the Data Portal’s Data Package Metadata web page through the use of “showdown.js” (https://showdownjs.com/) for Markdown and “MathJax.js” (https://www.mathjax.org/) for LaTex formatted math equations. Markdown provides a convenient way to add structural highlights to text elements, including the use of different heading styles, bold and italicized text, bulleted and numbered lists, and much more. The “showdown.js” package supports most of the commonly used GitHub flavored Markdown (https://github.github.com/gfm/) syntax and is processed by the client’s web browser. For example, the following snippet from the EDI Data Portal shows both a Markdown heading style and a bulleted list from a rendered EML document:
EDI Supports Temporary Data Embargoes Upon Request
March 1, 2021
You may not know that the Environmental Data Initiative provides an embargo service to temporarily block access to data tables (and other types of data) in your EDI uploaded data packages, but it does. We provide this service to satisfy requirements of many journals who request that data accompanying a manuscript be archived in a recognized data repository and assigned a valid Digital Object Identifier (DOI) before the manuscript is reviewed. It is often the case that the journal or manuscript author prefers that the data be off limits to the general public until the manuscript is fully published. For this reason, EDI will apply a temporary embargo on the data elements of your data package (metadata, however, are not permitted to be embargoed) at your request – all you have to do is ask through our support email address or directly on our Slack channel. We will remove the embargo once you let us know that the manuscript has been published. The best part is that the DOI remains the same before and after the embargo. Because EDI is a strong proponent of publicly accessible data, we will periodically reach out to owners of embargoed data to confirm the continued need of the embargo. Lastly, please let us know if and how we can improve this service.
Journal Citations Associated with EDI Data Packages
May 19, 2020
The EDI technical team recently modified the view of user contributed journal citations associated with a specific version of a data package so that the citation is now displayed on all versions of the data package, not just the version for which the citation is relevant. This enhancement allows users who browse the data package metadata landing page to see all citations related to data package series regardless of what version of the data package they are viewing. User contributed journal citations provide an easy way for the author of a data paper (or others) to increase the impact factor of the data package by directly linking the published manuscript to the supporting data package found in the EDI data repository. In fact, older manuscripts that utilize an EDI data package may be added to the list of journal citations even if the data package DOI was not available at the time of publication. We are currently exploring options for updating the DataCite DOI metadata of the data package with these citations so that downstream services may take advantage of this crowd-sourced information.
The Summer Fellowship Program of the Environmental Data Initiative
March 25, 2020
The Environmental Data Initiative (EDI) assists researchers from field stations, individual laboratories, and research projects of all sizes to archive and publish their environmental data. EDI’s very successful Summer Fellowship Program for Data Management Training is one component of our Outreach and Training program. For the third consecutive year, EDI is reviewing applications from interested undergraduate and graduate students to become an EDI summer fellow. This year we are seeking nine fellows to be trained in the data publishing process and to support 9 research sites in their efforts to manage their data. EDI’s aim is to ensure that these young professionals learn state-of-the-art data stewardship practices.
Cite: A Lightweight Citation Service for Data Packages in the EDI Data Repository
February 12, 2020
The EDI technical team has released Cite, a lightweight web-service that generates citations for data packages archived in the EDI data repository. Cite is simple to use and requires only the EDI data package identifier appended to the end of the Cite URL: “https://cite.edirepository.org/cite/”. For example, the URL “https://cite.edirepository.org/cite/edi.460.1”, when entered into a web-browser query field, returns the following ESIP-stylized citation:
Google Scholar Highlights EDI Data Packages as First-order Citations in User Profiles and in Scholarly Articles
February 10, 2020
Data is becoming increasingly citable as first-order objects, including data archived in the EDI repository. One indication is that data package publications are indexed in personal Google Scholar user profiles, along with other scholarly articles, as for example in the profile of Paul Hanson (Research Professor at the Center for Limnology, University of Wisconsin-Madison).
EDI is 40th DataONE Member Node
April 18, 2017
The contents of the EDI Data Repository is now discoverable through DataONE. EDI became the 40th DataONE Member Node when it registered its version of the DataONE Generic Member Node (GMN) software stack to synchronize EDI data content through the DataONE Federation.