Webinars
We host our webinars on the 3rd Tuesday of each month at 2pm Eastern Time (11am Pacific Time). We present unique as well as re-occurring topics on data publishing best practices, tools, and new developments, and you have the opportunity to join us with your questions on everything EDI during our regular “Town Halls.” Many of the webinars are recorded and available on our YouTube channel.
Request a webinar topic at info@edirepository.org.
An indepth look at EDI's Data Explorer (DEX)
October 18, 2022
EDI DEX is a new tool that allows you to explore tabular data directly from the EDI Data Repository. DEX currently supports data profiling, subsetting, and plotting. DEX is available on all three PASTA tiers (development, staging, and production). It is accessed from the Data Package landing page using the “Explore Data” button and is available for files that are recognized as containing tabular data. This webinar will present an overview of the DEX architecture and demonstrate how DEX works on actual data.
EDI Summer Fellows presentations
August 16, 2022
Our EDI Summer Fellows presented 5-minute eLightning talks on their data publishing experience for specific host sites, spanning diverse ecosystems across the country, from Maine’s Mount Desert Island to Puerto Rico’s Luquillo Mountains to the Palmyra Atoll in the central Pacific. You can find a list of all host projects here: https://edirepository.org/news/news-20220511.00
Domain repositories enriching the global research infrastructure
April 19, 2022
Domain repositories have long provided a suite of services to the communities built around them including metadata management and data discovery tools, data access and preservation, and identification of resources created by and of interest to the community. As a result, domain repositories are central to important and thriving research communities. Recently a global research infrastructure for identifying and tracking many kinds of research objects has emerged. This includes Crossref, originally for journal articles and books, DataCite, originally for datasets but expanding to other research objects, as well as ORCID for identification of people and ROR for organizations. Crossref and DataCite initially focused on the creation of research object identifiers (DOIs). As researchers start to use these identifiers, it is clear that the connections enabled by these identifiers add considerable value. In order to enable these connections the growing, global research infrastructure needs content beyond minimal identification and citation metadata. Domain repositories are uniquely situated, with the deep knowledge of their communities, to extend their services by providing connections and content to deliver additional metadata to enrich the global research infrastructure. This talk will share several recent examples that demonstrate the power of enriching the global research infrastructure.
New features of ezEML
March 15, 2022
ezEML is EDI’s web-based tool that lets even EML newbies create and edit EML metadata for their data packages. When ezEML was first introduced, we demonstrated its Release 1.0 features in a webinar https://youtu.be/LVRoFmTwvtU. But a lot has changed since then. A number of significant new features have been added, and another ezEML webinar seems called for. We demonstrated what’s new, answered questions, and welcomed feedback. A few of the most significant additions:
Informational webinar on 2022 Fellowship Program
March 8, 2022
We have 15 fellowships available in our ecological data management training program for this summer. We are now requesting applications from undergraduate, graduate and recently graduated students. In this webinar we will give an overview of the training program and answer interested students' questions.
How FAIR are our data?
February 15, 2022
FAIR - Findable, Accessible, Interoperable, Reusable, is a high level framework for communicating quality of metadata in terms of making data usable for somebody who was not directly involved in the sampling. More recently the Research Data Alliance (RDA) has developed guidelines for evaluating FAIR, the FAIR data maturity model. This framework is still fairly general and research communities need to expand it with more specific community level criteria. DataONE has taken the initiative to develop such criteria for data in the DataONE community.
Keep your science up-to-date with EDI repository event notifications
November 16, 2021
Science that is based wholly or in part on data archived in the EDI repository can utilize our event notification service to be notified anytime input data changes. Ultimately, this accelerates science, enables projects to live beyond their completion date, encourages reproducibility, and allows you to do more by reducing manual tasks. Join us for an in-depth look at EDI repository event notifications, what they are, how they work, and how to get started using them.
Informational webinar on EDI’s 2022 Summer Fellowship Program
October 19, 2021
This webinar is inteneded for researchers/projects/field stations interested in hosting an EDI fellow for two months during the 2022 summer. We answered regarding our 2022 Data Management Fellowship Program. The purpose of the paid fellowship program is to train the Fellows in ecological data management and enable them to support host projects in developing sustainable workflows for publishing of their research data.
A workflow for automating the update of continuous data in EDI using GitHub actions
September 21, 2021
An autonomous data processing/publishing workflow driven by GitHub actions is presented. The workflow downloads data from the Southern and Central & Northern California Ocean Observing Systems, SCCOOS and CeNCOOS respectively, transforms the data, then publishes to the EDI Data Repository and the Global Biodiversity Information Facility (GBIF). This workflow can be generalized for other data processing/publishing use cases.
CZ Manager: software for sensor and sample based earth observations
August 17, 2021
CZ Manager is a graphical user interface (GUI) for Observations Data Model 2 (ODM2). This software was originally developed for managing data collected from the field in Luquillo, Puerto Rico for the NSF funded Critical Zone Observatory program and continues to be used primarily for sensor-based data collected by the Luquillo LTER. CZ Manager is also in use for sensor and sample-based data from the waterways of the New Hampshire Great Bay region as part of the Piscataqua Region Estuaries Partnership (PREP). CZ Manager includes a suite of tools for mapping sites, plotting data, managing data QA/QC, and Water OneFlow web services (WaterML) for integration with data.cuahsi.org.
Updating schema.org metadata for data packages in the EDI Data Portal to provide rich semantic information that can be utilized by search engines and Google Scholar
June 15, 2021
This webinar gave an overview of how the EDI technical team is now updating the schema.org metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project (https://github.com/ESIPFed/science-on-schema.org). EDI initially released schema.org metadata for each data package in Fall 2018. Along with the sitemaps.org metadata, the schema.org metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE schema.org indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface because of the detailed information provided to Google’s search engine indexer via the schema.org metadata.
How to publish data in the EDI data repository – an overview
May 18, 2021
The Environmental Data Initiative (EDI) assists researchers from field stations, individual laboratories, and research projects of all sizes to publish their environmental data and meet obligations to funding agencies such as NSF and publishers. EDI is committed to enable data that is Findable, Accessible, Interoperable, and Reusable (FAIR). In this webinar we will highlight the benefits of data publishing and give an overview of the workflow of publishing data in the EDI Data Repository. We touched on the following components of the data publication process: (1) What is a data repository and how to choose a suitable repository for your data; (2) Cleaning and organizing datasets in preparation for publishing them; (3) Describing data with metadata so others can understand and re-use them; (4) Publishing the data: what happens behind the curtain at EDI; (5) Having an easy mechanism to display and share your data resources on your own website.
Updates on ezEML – a form-based online application for creating metadata in EML
April 20, 2021
EDI recently released ezEML, a form-based, online, do-it-yourself tool for creating metadata in the Ecological Metadata Language (EML). You don’t need to know EML to create metadata using ezEML. In a webinar back in October, we presented an overview and demo of the initial release of ezEML. A lot has changed since then. New features of ezEML were demonstrated, highlighting, among other things, on-the-fly metadata checking, missing value code detection, and the ability to re-upload data tables and other entities, to clone selected column attributes from one data table to another, to create metadata for related projects, and to submit metadata and data files to EDI with a single click.
Informational webinar on 2021 Fellowship Program
March 16, 2021
We have 15 fellowships available in our ecological data management training program for this summer. We are now requesting applications from undergraduate, graduate and recently graduated students. In this webinar we will give an overview of the training program and answer interested students' questions.
Update on the EMLassemblyline – an R package for creating metadata in EML
January 19, 2021
EMLassemblyline (EAL) is an R library for programmatically creating EML metadata. It is optimal for recurring data publications (time series or data derived from time series sources) but works well for “one-off” publications. EAL prioritizes automated metadata extraction from data objects to minimize required human effort and encourages EML best practices to make data publications Findable, Accessible, Interoperable, and Reusable.
Update on harmonizing meteorological and hydrological data in the EDI data repository
December 15, 2020
The LTER (Long-term Ecological Research) ClimDB/HydroDB database and user interface for meteorological and hydrological observations will soon be retired. In 2019, a working group of LTER, US Forest Service and EDI Information Managers created a roadmap for preserving the database in the EDI data repository, now in progress. The working group adopted the Observational Data Model (ODM) 1.1 of the Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI) as the most useful format for meteorological and hydrological data, which will give research sites the opportunity to submit data to CUAHSI and make use of its visualization tools. Therefore, ClimDB/HydroDB content is being exported as ODM 1.1 tables with EML metadata. The CUAHSI ODM model is also recommended for harmonizing all suitable meteorological and hydrological data, to be added to the EDI data repository as “analysis ready” for supporting reuse or contribution to CUAHSI.
10 features of the EDI data repository you may not know about
November 24, 2020
The Environmental Data Initiative (EDI) supports a robust data repository that is accessible through the EDI Data Portal. Users of the Data Portal are generally familiar with publishing and searching for data, but there are many other features of the Data Portal (and underlying services provided by EDI) that many users are not familiar with, like adding provenance metadata to a data package. This webinar is a compendium of 10 little known features of the EDI data repository and related services. Learn some cool tips and tricks of EDI.
Informational webinar on EDI’s 2021 Summer Fellowship Program
October 20, 2020
This webinar was inteneded for researchers/projects/field stations interested in hosting an EDI fellow for two months during summer 2021. We answered regarding our 2021 Data Management Fellowship Program. The purpose of the paid fellowship program is to train the Fellows in ecological data management and enable them to support host projects in developing sustainable workflows for publishing of their research data.
Overview of ezEML
October 06, 2020
EDI recently released ezEML, a form-based online application designed to streamline the creation of metadata in the Ecological Metadata Language (EML). ezEML is aimed at scientists and others who want to prepare their dataset for submission to a data repository but are not themselves proficient in the details of EML editing. ezEML is designed to give such users a do-it-yourself tool that can handle the great majority of typical datasets. It incorporates a User Guide and extensive Help to guide users through its use. Join us for for an overview and a demonstration on how to use ezEML!
Informational webinar on EDI’s 2020 Summer Fellowship Program
October 29, 2019
This webinar was inteneded for researchers/projects/field stations interested in hosting an EDI fellow for two months during summer 2020. We answered regarding our 2020 Data Management Fellowship Program. The purpose of the paid fellowship program is to train the Fellows in ecological data management and enable them to support host projects in developing sustainable workflows for publishing of their research data.
Ecological Metadata Language EML 2.2: Overview of new features
October 8, 2019
The EML-development group released of EML 2.2, and thanks all contributors for their work, discussion, feedback, and community stewardship. In this webinar, new features of EML 2.2 are presented. You can read the specification of EML 2.2 and download the distribution at https://eml.ecoinformatics.org. A summary of the changes is given in “What’s New in EML 2.2.0” at https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html.
CUAHSI tools for data management
June 18, 2019
Martin Seul presented CUAHSI tools (hydroclient and uploader) and answered questions about CUAHSI and CUAHSI tools. CUAHSI tools enable users to discover, preview and download time series data such as stream gauge measurements and meteorological station measurements from federal agencies, university researchers, and watershed organizations, all in the same format, the Community Observation Data Model.
Open science framework for reproducible research
April 2, 2019
Open Science Framework (OSF) is a free and open source platform where researchers can collaborate, share data, show their research process and document their research. OSF is developed by Center for Open Science (COS; http://cos.io/), a non-profit technology and culture change organization with a mission to improve the openness, integrity, and reproducibility of scientific research. OSF’s command line client , its storage add-ons (integrations with gdrive, box, dropbox, github and more) and its preprint servers like EarthArXiv, a geoscience pre-print that complements AGU’s Earth and Space Science Open Archive (ESSOAr).
Enabling FAIR Data: Helping our researchers share their data and the reward of attribution and credit
February 19, 2019
The data that underpins our research is a valuable part of the research process. The role scientific repositories have in helping researchers share their data and make it understandable is critical to the research lifecycle. Research data is best preserved in a trusted digital repository with robust descriptive information.
Using the GCE Data Toolbox to automate environmental data processing and produce EML-described data packages for EDI
January 8, 2019
An overview of the GCE Toolbox and its history is given in this webinar along with its use to automate environmental data processing and produce EML-described data packages for archiving in the EDI repository.
Introduction to some tools for creating taxonomic coverages in metadata
December 18, 2018
One common way to search for suitable research data is to locate data about a specific organism or group of organisms. Modern taxonomys capture both the names of individual species and the relationships between them, along with hierarchies representing estimated phylogenys. To facilitate searching at a variety of taxonomic levels (species, genus, family, order etc.), metadata needs to include the relevant taxonomic terms. This webinar will discuss how taxonomic information can be incorporated into standardized metadata. Additionally, it will introduce software for the R statistical software that can be used to query existing taxonomic hierarchies and produce the needed metadata elements.
Associating EDI data packages with the journal articles that use them
November 13, 2018
PASTA+ now provides a set of web services to enable users to document and manage associations between data packages in the Environmental Data Initiative (EDI) data repository with journal articles or other manuscripts that are known to utilize the data objects contained within them. The EDI Data Portal has been enhanced to take advantage of these new “journal citation web services” with a convenient form-based user interface. Journal citations documented in PASTA+ through these web services are also registered in DataCite by storing the association in the data package’s DOI metadata.
Finding and contributing software solutions for data management needs
October 30, 2018
The Information Management Code Registry (IMCR) is a hub for sharing software solutions to common data management tasks encountered in the environmental sciences (i.e. data collection, quality assurance, documentation, archive, discovery, and synthesis). The IMCR helps distribute data management expertise, promote convergence on best practices, and support citable scientific workflows. In this webinar, you will receive an overview of the IMCR and how to use it to find and contribute data management software.
Informational webinar on EDI’s 2019 Summer Fellowship Program
October 9, 2018
This webinar was inteneded for researchers/projects/field stations interested in hosting an EDI fellow for two months during the 2019 summer. We answered regarding our 2019 Data Management Fellowship Program. The purpose of the paid fellowship program is to train the Fellows in ecological data management and enable them to support host projects in developing sustainable workflows for publishing of their research data.
Experimental use of Sitemaps.org and Schema.org metadata for Search Engine Optimization
September 25, 2018
The Environmental Data Initiative (EDI) has just released an experimental implementation of the sitemaps.org and schema.org metadata to support search engine discovery and indexing (often called Search Engine Optimization). Sitemaps metadata serves as a table of contents for high-value information found on websites so that search engines may more easily discover relevant web pages to index. For EDI, the sitemaps metadata points to the most recent data package landing pages accessible through the EDI Data Portal and is refreshed hourly. This webinar gives an overview on how EDI enhances the content indexed by search engines. The dataset metadata provides a recognized and consistent vocabulary of dataset attribute information for search engines to index, thereby improving the overall user experience when searching for data packages on the Internet. For more information, please check out Google’s recent announcement of their beta Dataset Search tool.
Organizing data into publishable units
July 31, 2018
Organizing data into publishable units can be challenging because it’s often influenced by multiple factors including: data theme and volume, the unique needs of providers and users, the level of processing required, and the general purpose the data serve. To inform a set of guidelines for organizing data into publishable units we had an open discussion to share experiences, issues, and recommendations.
Going Static with LTER Websites
May 29, 2018
Nearly all Long-Term Ecological Research (LTER) websites employ dynamic servers that facilitate authoring content via Content Management Systems (CMS) like WordPress or Drupal, or which generate content on the fly as end users request a given page in their browser. While powerful, such systems also have a cost related to security updates for the CMS or operation of the server itself. In this webinar we introduce an alternative called static websites, in which web pages are created ahead of time and served exactly as they are stored on the server. Server costs tend to be much lower for static websites, and CMS security updates become a thing of the past. We will explore BLE’s static website, and demonstrate how to utilize external servers to support a searchable bibliography via Zotero or data catalog via PASTA+. The bibliography and data catalog components (as well as the entire BLE website) are available on GitHub for you to see how they’re implemented, collaborate on improving them, or simply download them for your own site.
Postgres, EML and R in a data management workflow
May 8, 2018
Metadata storage and creation of EML have been always a challenge for people/organizations who want to archive their data. A workflow was developed to combine efficient EML record generation using the package developed by the R community with the advantages of centrally-controlled metadata in a relational database (metabase). We will focus on two components of the workflow in this webinar: 1. display the metadata storage in a relational database; and 2. one example of EML file generation using pre-defined R functions.
Create a local data catalog on your website
April 24, 2018
Now that your datasets are in a public repository, there are two basic ways to create a catalog of them, which we will introduce in this webinar. The first is simple: for a few datasets, you can list them on your own webpage, with a static link to the dataset in the EDI portal. Second, if you have many datasets, or want to customize the display, we will introduce you to the PASTA+ search API so you can capitalize on EDI repository web services. With this method, you can customize and style locally, without having to store data or metadata.
An Overview of the EDI Data Repository and Data Portal (and how to use it for publishing data)
April 10, 2018
The Environmental Data Initiative data repository is a metadata-driven archive for environmental and ecological research data described by the Ecological Metadata Language. This webinar will provide an overview of the PASTA software used by the repository and demonstrate the essentials of uploading a data package to the repository through the EDI Data Portal.
Make metadata with the EML assembly line
March 27, 2018
High quality structured metadata is essential to the persistence and reuse of ecological data; however, creating such metadata requires substantial technical expertise and effort. To accelerate the production of metadata in the Ecological Metadata Language (EML), we’ve created the EMLassemblyline R code package. Assembly line operators supply the data and information about the data, then the machinery auto-extracts additional content and translates it all to EML. In this webinar Colin presented an overview of the assembly line, how to operate it, and a brief demonstration of its use on an example dataset.
What are metadata and structured metadata?
March 13, 2018
Metadata are essential to understanding a dataset. In her talk, Kristin discussed how structured metadata are used to document, discover, and analyze ecological datasets. She will also offer some tips for creating quality metadata content. Margaret gave an introduction to the metadata language used by EDI, Ecological Metadata Language (EML). EML is written in XML, a general purpose mechanism for describing hierarchical information, so she will also describe some general XML features and how these apply to EML.
Explanation of the EDI metadata template
March 6, 2018
The tutorial gives an overview of the metadata template that is used by EDI for metadata preparation in the process of publishing a data package in the EDI data repository. Specifically, the purpose and content of the template will be explained. The template can be downloaded from EDI’s GitHub space at: https://github.com/EDIorg/MetadataTemplates.
Creating clean data for archiving
January 30, 2018
Not all data are easy to use, and some are nearly impossible to use effectively. This presentation lays out the principles and some best practices for creating data that will be easy to document and use. It will identify many of the pitfalls in data preparation and formatting that will cause problems further down the line and how to avoid them.
Clean your taxonomy data with the taxonomyCleanr R package
January 16, 2018
Taxonomic data can be messy and challenging to work with. Incorrect spelling, the use of common names, unaccepted names, and synonyms, contribute to ambiguity in what a taxon actually is. The taxonomyCleanr R package helps you resolve taxonomic data to a taxonomic authority, get accepted names and taxonomic serial numbers, as well as create metadata for your taxa in the Ecological Metadata Language (EML) format.
The Environmental Data Initiative (EDI): Supporting curation and archiving of environmental data
January 9, 2018
The Environmental Data Initiative (EDI) was funded by the National Science Foundation (NSF) to accelerate curation, archiving, and publishing of environmental data. EDI provides a secure data repository and data curation support for ecological research projects with emphasis on NSF funded programs including Long Term Research in Environmental Biology (LTREB), Organization for Biological Field Stations (OBFS), Macrosystems Biology (MSB), and Long Term Ecological Research (LTER). The EDI Data Repository is an extension of the Provenance Aware Synthesis Tracking Architecture (PASTA) developed originally to house LTER data. EDI is a DataOne member node (www.dataone.org) and is listed in the Registry of Research Data Repositories (re3data.org). EDI supports and trains members of the environmental sciences community to archive and publish high-quality data and metadata.
Using checksums to speed up data package uploads
November 28, 2017
A new option is now available in the Data Portal for speeding up data package uploads to PASTA. When selected, the “useChecksum” option allows PASTA to use an existing copy of a data entity if its checksum matches the checksum documented in your EML metadata. This tutorial covers the effective use of this potential time-saving option and is intended for Information Managers who use the Data Portal interface to upload data packages to PASTA.
Transform and visualize data in R using the packages tidyr, dplyr and ggplot2
October 24, 2017
This is the second of two tutorials on how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.
Transform and visualize data in R using the packages tidyr, dplyr and ggplot2
October 17, 2017
This is the first of two tutorials on how to tidy data in R with the package “tidyr” and transform data using the package “dplyr”. The goal of those data transformations is to support data visualization with the package “ggplot2” for data analysis and scientific publications of which examples were shown.
Data package design, featuring the candidate model for community survey data
September 26, 2017
This session was an informal discussion of the progress on EDI’s ecocomDP design pattern (https://github.com/EDIorg/ecocomDP). EcocomDP was compared with the relational model used by the Popler project. Participants from that project attended the video meeting (Rice, et al, https://eco.confex.com/eco/2017/meetingapp.cgi/Paper/62280).
The CCE/PAL LTER information management system
June 19, 2017
The CCE/PAL information management system is presented, specifically the software and workflow for acquiring and ingesting metadata and data into the CCE/PAL information management system, generating EML, and creating the local data catalog.