2022 Data Management Fellowship Program

May 11, 2022

Susanne Grossman-Clarke


We have awarded 15 fellowships for our ecological data management training program for this summer. The Fellows will receive training in ecological data management and gain hands-on experience through participation in data preparation and publishing with scientists and information managers from specific host research projects. See below for a list of host projects and mentors.

Important dates

  • Virtual data publishing training workshop: 13 June – 15 June 2022 (learn about data cleaning, metadata content and how to publish data in the EDI data repository).
  • Engagement with host projects: 20 June – 19 August 2022

Please contact Susanne Grossman-Clarke ( if you have questions.

Host projects and mentors for training program

  • Arizona State University, School of Life Sciences . Location: Tempe, AZ/Remote . Mentor: Hinsby Cadillo (

    • Project description: The Fellow will synthesize a 4 year dataset of greenhouse gas emissions, tree surveys and growth, hydrological monitoring and primary productivity estimates from Amazon peatlands monitoring Monitoring is funded by NSF DEB and the National Academy of Sciences-USAID). Furthermore, the Fellow will prepare a data publication in EDI as well as integrate the data to the RAINFOR network and GEM database.
  • Bates College, Northeastern Coastal Stations Alliance (NeCSA) . Location: Phippsburg, ME . Mentor: Caitlin Cleaver (

    • Project description: NeCSA is a collaboration of coastal field stations in the Gulf of Maine working together to understand climate change impacts in this rapidly changing system. The Fellow  will collate and clean five years of data from 10 member stations (water temperature data, biological community data - species occurrence, species counts and more). The Fellow will develop and publish a workflow for future NeCSA data collection efforts. In addition, the Fellow will work with intertidal crab survey data collected by []>(Manomet), an organization that uses science to address real-world challenges.
  • Charleston Community Research to Action Board (CCRAB) . Location: Charleston, SC/Remote . Mentors: Omar Muhammad ( and Chloe Stuber (

    • Project description: CCRAB conducts citizen science projects that lead and fund sampling of air and water quality data. The data are used to address environmental justice challenges and health disparities. The Fellow will clean and publish the data in EDI as well as build a CCRAB data catalog through which community scientists can easily and readily access the data and information from ongoing research related to physical/social environmental hazards and associated health disparities.
  • Colorado State University, Natural Resources Lab . Location: Fort Collins, CO . Mentor: Chris Dorich (

    • Project description: The Global N2O Database contains data on nitrous oxide (N2O) emissions as well as covariate data such as climate data, soil moisture and temperature, soil inorganic nitrogen (N), crop yields and other greenhouse gas emissions (methane, carbon dioxide) from experimental agricultural sites. The Fellow will work on publishing new data of the database in EDI as well as data from the Global Nitrogen Database (nitrogen related covariates, crop N yield, soil N, and more). The published data will be essential for a workflow for agricultural model testing and calibration. Skills in R programming are preferred.
  • Florida International University, Department of Biological Sciences . Location: Miami, FL/Remote . Mentor: Dr. Sparkle Malone (

    • Project description: The subtropical Everglades landscape is characterized by a unique network of freshwater and coastal wetland ecosystems. Like other coastal wetland ecosystems, primary productivity changes in response to variable inundation regimes and salinity. Several long-term eddy covariance sites maintained by FCE-LTER and the Everglades Flux Tower Network are placed along a hydrological gradient (marl prairie, freshwater marsh, mangrove scrub,tall riverine mangrove forests) to collect data on CO2 and CH4 fluxes. The Fellow will publish all tower data with the Ameriflux network and all derived data products with EDI, making raw and derived data available to the wider ecology community.
  • Hutchings Museum Institute Utah Lake Field Station . Location: Lehi, UT . Mentor: Daniela Larsen (

    • Project description: Hutchings Museum Institute Oology data collection project is an open access data collection on eggs and birds around Utah Lake, dating back to the 1930's. The Fellow will document, clean and publish the collected data in EDI. The data collection will be used by Utah community stakeholders for future plans and developments of Utah Lake, in determining how much land needs to be protected and mitigated, thereby ensuring environmental considerations are factored into local and regional planning and development decisions.
  • Iowa Lakeside Laboratory Regents Resource Center . Location: Milford, IA . Mentor: Mary Skopec (

    • Project description: Lakeside Lab is located on a chain of natural glacial lakes in the northwest part of Iowa. Data are collected by two water quality buoys that measure water quality parameters (dissolved oxygen, pH, CO2, and more) every 15 minutes from the lake surface to the lake bottom. Other data result from a wave sensor and meteorological sensors. The Fellow will clean, synthesize and publish those data in EDI and develop develop QA/QC protocols and review of the sensor data. The goal is to make the data easily and publicly accessible to the community, researchers,  water utilities and watershed management agencies.
  • Montana State University . Location: Bozeman, MT/Remote . Mentor: Venice Bayrd (

    • Project description: Montana researchers have been monitoring ecological restoration of the mine-waste-contaminated Upper Clark Fork River (UCFR) floodplain in Montana for multiple decades. Biomass, dissolved organic carbon, metals, nutrients, and many more observables have recently been collected for the time period 2017 – 2021. The Fellow will join our data curation team in a continuing effort to develop workflows to preserve and disseminate data from the interdisciplinary UCFR project, with a focus on cleaning and depositing a selection of the data in the EDI data repository. In the process, the Fellow will gain experience in enhancing our automated EML metadata generation process in alignment with content standards designed for curation of both data and other research objects from the scientific workflow. Opportunities to visit field sites during data collection can be made available to interested Fellows.
  • Mount Desert Island Biological Laboratory (MDIBL) and Anecdata . Location: Bar Harbor, ME . Mentor: Alexis Garretson (

    • Project description: The MDIBL phytoplankton monitoring project has monitored harmful phytoplankton blooms in the Frenchmen Bay ecosystem surrounding Acadia National Park since 1997. The Fellow will collate and synthesize phytoplankton data stored in different systems into a single data product, curate the data and subsequently publish the dataset in the EDI data repository. If the Fellow is interested, an internal blog post, data paper, report to volunteers, or other materials announcing the publication of the dataset can be prepared. Depending on the time remaining the Fellow might process and publish other data. As the Gulf of Maine is the fastest warming body of water on the planet, the dynamics of phytoplankton bloom are of extreme interest and importance to researchers across the country in understanding marine ecology and the potential impacts of anthropogenic climate change.
  • New Mexico State University, Jornada Basin LTER . Location: Las Cruces, NM/Remote . Mentor: Dr. Gregory Maurer (

    • Project description: The Jornada Basin LTER has a rich history of meteorological observations collected across the basin from the 1980's to the present. These observations span multiple research projects, instrument networks, spatial scales, and time periods, but there is considerable demand for synthesizing these disparate observations into research-ready climate datasets that can support research across the Jornada's spatial and temporal domains. The Fellow will compile metadata about several of the key meteorology datasets, develop a workflow to quality control and quality assure these long-term datasets, and publish harmonized versions of the data in EDI. Programming skills in R will be helpful, but JRN can offer some training in R if needed.
  • The Learning Partnership and Luquillo LTER . Location: Remote . Mentor: Steven McGee (

    • Project description: The Fellow will support the Luquillo Schoolyard Data Jam initiative. The essence of Data Jam is supporting middle- and high-school students in exploring, analyzing, and summarizing long-term data about the environment and then creatively communicating their discoveries to non-scientific audiences. One important element of Data Jam is providing students with appropriate datasets. The datasets need to be structured so that they are manageable, yet robust enough to support independent investigations. The Fellow will assist in creating new derived datasets by processing and combining LTER datasets. Data Jam’s next teacher workshop will be held in June 2022. The Fellow will attend the workshop to meet teachers and observe how teachers interact with the existing datasets. The Fellow will use teacher suggestions to process datasets in preparation for the beginning of the school year. The derived dataset will be published in the EDIT data repository.
  • University of California Riverside, Natural Reserve System . Location: Riverside, CA/Remote . Mentor: Marko Spasojevic (

    • Project description: The Fellow will have the opportunity to archive two datasets collected in southern California, each featuring multiple years of climate data recorded either by sensors (James Reserve) or by hand (Deep Canyon). Core tasks for the Deep Canyon dataset would be creating a pipeline to input existing and future excel sheets and concatenating records into non-proprietary tabular formats (e.g., .csv) to archive in the EDI repository. For the James reserve data, core tasks will include trimming records to sensor in-situ dates, checking for erroneous records, applying existing R code to model minimum and maximum daily temperatures from semi-hourly data, and concatenating records into non-proprietary tabular formats (e.g., .csv) to archive in the EDI repository. The Fellow will also document each step as a workflow for ongoing projects and if time and the Fellow’s skillset allow, there are also opportunities to archive derived products of these data.
  • UC Santa Barbara, Jenn Caselle Lab/Palmyra Atoll Research Consortium (PARC) . Location: Remote . Mentors: Dr. Jennifer Caselle ( and Camila Vargas Poulsen (

    • Project description: The Palmyra Atoll Data Library (PADL) is a joint effort between The Nature Conservancy (TNC), PARC and the Caselle lab with the aim to publish previous data from Palmyra in one data library. PARC and TNC have been conducting ecological, oceanographic, and behavioral research at Palmyra since 2002, collecting incredible data throughout the years. Unfortunately, much of this data is scattered across university websites, several data repositories, and personal servers in many formats. The fellow will work closely with PADL’s data manager, learning about PADL’s workflow for data mobilization from Palmyra's researchers. The Fellow will have the opportunity to clean and structure data, assemble the appropriate metadata, and publish data packages to EDI.
  • University of Wisconsin-Madison Arboretum/Journey North . Location: Madison, WI/Remote . Mentor: Nancy Sheehan (

    • Project description: Journey North (JN) is a citizen science project that encourages people from across North America in tracking wildlife migration and seasonal change. JN has datasets on Barn Swallows, American Robins, Red-winged Blackbirds, Bald Eagles, Baltimore and Bullock’s Orioles, Common Loons and more. The Fellow would be tasked with data cleaning and prepping, writing metadata documentation, and publishing these datasets. The 2022 EDI Fellow will be greatly aided by the efforts of the previous Fellow, who created a workflow and documented all data cleaning and prep work (using Python and Jupyter Notebook). Depending on time and interest, the Fellow could integrate JN datasets into online data visualizations using Tableau.
  • USDA Forest Service, Northern Research Station . Location: Grand Rapids, MN/Remote . Mentor: Nina Lany (

    • Project description: The Marcell Experimental Forest (MEF) in northern Minnesota is operated by the USDA Forest Service and was established in 1962 to study the ecology and hydrology of peatlands. The Fellow will work with streamflow data gathered at 5-minute resolution by environmental sensors in six catchments instrumented for hydrologic monitoring within the 1100-hectare experimental forest. The Fellow will develop scripted workflows in R to organize raw data, document the transition from mechanical to electronic sensors, and optionally write Python code to display the data within a visually appealing online data dashboard. The Fellow will publish two new data packages in EDI. On-site accommodations include high-speed internet, laundry, and kitchen, as well as nearby hiking trails and lakes.