Updating Metadata for Data Packages in the EDI Data Portal to Provide Rich Semantic Information That can be Utilized by Search Engines and Google Scholar

May 3, 2021

Susanne Grossman-Clarke


The EDI technical team is now updating the metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project ( EDI initially released metadata for each data package in Fall 2018. The dataset metadata is encoded as a JSON-LD data structure that is embedded within script tags on the data package metadata landing page. Along with the metadata that acts as an SEO content table of index, the metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface ( because of the detailed information provided to Google’s search engine indexer via the metadata:

The same information indexed by the search engine is also made available to Google Scholar where it is entered in the list of citations being displayed for individual user profiles (see here for Mark Servilla’s profile –

Current updates include the addition of more specific data package metadata to the dataset JSON-LD block and a new research data repository JSON-LD block that better describes the EDI data repository host archive. EDI will continue to improve the metadata content for both data packages and the data repository as new recommendations are released.