News

Updating schema.org Metadata for Data Packages in the EDI Data Portal to Provide Rich Semantic Information That can be Utilized by Search Engines and Google Scholar

May 3, 2021

Susanne Grossman-Clarke

Description

The EDI technical team is now updating the schema.org metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project (https://github.com/ESIPFed/science-on-schema.org). EDI initially released schema.org metadata for each data package in Fall 2018. The dataset schema.org metadata is encoded as a JSON-LD data structure that is embedded within script tags on the data package metadata landing page. Along with the sitemaps.org metadata that acts as an SEO content table of index, the schema.org metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE schema.org indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface (https://bit.ly/3nDhT8j) because of the detailed information provided to Google’s search engine indexer via the schema.org metadata:

The same information indexed by the search engine is also made available to Google Scholar where it is entered in the list of citations being displayed for individual user profiles (see here for Mark Servilla’s profile – https://scholar.google.com/citations?user=XkKnDhEAAAAJ&hl=en):

Current updates include the addition of more specific data package metadata to the dataset JSON-LD block and a new research data repository JSON-LD block that better describes the EDI data repository host archive. EDI will continue to improve the schema.org metadata content for both data packages and the data repository as new recommendations are released.