Evaluating a Data Package

Evaluating a data package is key to ensuring that it contains a valid, well-formed EML file that accurately describes the data. Evaluations can be performed manually via the EDI Data Portal, or programmatically using the EDIutils R package or the REST API.

Evaluating a data package requires an EDI account. Other methods of authentication (e.g. Google, ORCID, GitHub) do not allow evaluation.

To learn more about evaluation and to request new checks, see the EML Congruence Checker GitHub.

EDI Data Portal

To evaluate a data package via the EDI Data Portal:

Login

From the Data Portal, click the Login button and enter EDI account credentials.

Evaluating the data package

  1. Navigate to the Evaluate/Upload Data Packages page:

  2. Choose File - Browse and select the EML file to be evaluated

  3. Unless every data object in the EML file is associated with static data links, select the checkbox next to Manually upload data… to allow manual upload.

  4. From the manual uploads page, choose the files to upload for the data package and click the Evaluate button.

Be aware that the length of the evaluation process increases with the size of the data being evaluated. Once the process has begun, the browser window can be closed without interrupting the evaluation. Use the EDI Dashboard PASTA is Working On feature to see when evaluation has completed. The evaluation report summary and full report can be viewed from the View Evaluate/Upload Results page.

Viewing the evaluation report summary

After the evaluation runs, the View Evaluate/Upload Results page will be displayed. This page provides the Evaluation Report Summary, the first view of the data package quality.

Viewing the evaluation report

This summary table also provides a link to the full Evaluation Report, which documents the outcome of each evaluation test performed, including the specific areas for improvement noted by Warn and Error labels.

EDIutils

To evaluate a data package using the EDIutils R package:

Login

Use the login() function with EDI account credentials.

Evaluating the data package

Use the evaluate_data_package() function to begin the evaluation process and provide the full path to the EML file and the repository environment to evaluate in.

This function returns a "transaction identifier" that is used to reference the evaluation in subsequent function calls. After the evaluation process has begun, pass the transaction identifier to the check_status_evaluate() function to determine if evaluation has completed.

Viewing the evaluation report summary

Pass the transaction identifier to the read_evaluate_report_summary() function to view the evaluation report summary as plain text.

Viewing the evaluation report

Pass the transaction identifier to the read_evaluate_report() function, which returns raw XML by default, but will return HTML or text by specifying the as parameter.

For a language-agnostic solution, see the REST API documentation for Evaluate Data Package, Read Data Package Error, and Read Evaluate Report.

Handling warnings and errors

Any evaluation check that results in a warning or error status should be resolved before moving ahead (note that errors must be corrected). Resolve problems with the data and metadata and repeat the evaluation process until all errors (and preferably all warnings) are resolved.

Interpreting the evaluation report summary

The evaluation report summary is displayed as part of the evaluation process via the EDI Data Portal or can be generated from the EDIutils read_evaluate_report_summary() function.

Purpose

Provide a quick look at the data package evaluation results. Specifically, the total number of checks run and how many of these checks resulted in statuses of valid, info, warn, or error. The meaning of these status messages is as follows:

Status Explanation
Valid The result of the quality check matches the expectation. Success!
Info The result of the quality check may or may not match the expectation, but since the expectation is not required, information is returned instead of a Warn or Error.
Warn The result of the quality check does not match the expectation. A match is not explicitly required to publish the data package, but strongly recommended.
Error The result of the quality check does not match the expectation. A match is required before the data package can be published.

Interpreting the evaluation report

A Data Package Quality Report, otherwise known as an Evaluation Report, is generated whenever a data package is evaluated or published.

Structure

The Evaluation Report is typically broken into multiple parts, always starting with the Dataset Report, and followed by an entity report for each data object included in the data package.These are differentiated by header lines with the Entity Name and Identifier.

The Dataset and Entity Reports share the same layout:

Report Column Name Explanation
# The number of the quality check
Identifier The identifier of the quality check
Status The status of the result of the quality check
Quality Check Describes the type of the quality check (data, metadata, or congruency), the system (knb, lter), and the status that results on failure
Name The name of the quality check
Description Brief description of the quality check
Expected The result that the quality check is expecting
Found The actual result of the quality check
Explanation Additional information describing the rationale of the quality check
Suggestion Potential data package improvements to implement to pass the quality check
Reference Source of the rationale for the quality check or where to find more information

Interpretation

Parse through the document and address any errors or warnings (denoted by the Error and Warn labels). To understand why a quality check failed, first read the Name and Description of the quality check to determine what was being tested and how the test was being conducted. Then, compare the Expected result to what was Found. If it is still not clear what caused the failure, try to gain additional insight from the Explanation, Suggestion, and Reference fields, or contact EDI for clarification.