News

A Quick Overview of EDI’s Data Explorer (DeX)

January 28, 2022

Susanne Grossman-Clarke

Description

The EDI software team is excited to announce DeX, a tool for exploring and subsetting tabular data, which is now in beta testing on the EDI staging Data Portal (https://portal-s.edirepository.org/nis). DeX provides three views into tabular data found in the EDI Data Repository: 1) a statistical profiler that analyzes the data table and displays detailed information about each attribute; 2) a filter and subsetting application that allows you to download the subsetted data, along with a new EML metadata document describing the subset; and 3) a simple-to-use scatter and line plotting application that gives you a visual glimpse into data trends. DeX is currently available on either of our development or staging Data Portals and works with CSV-based data tables (soon to work with a wider set of tabular formats). To see DeX in action, look for a data package in the staging Data Portal containing a CSV data file and click on the “Data Explorer – experimental” link at the end of the data entity record information (see below):

DeX is written in Python using the Pandas and Pandas Profiling packages for analyses and subsetting and uses the Bokeh package for plotting. The DeX “Profile” page is a Swiss Army Knife of statistical analyses and provides you with a good understanding (to the best of Pandas Profiling’s ability) of the data. Note that the Profile does not use the information within the EML metadata and may not align perfectly with attribute-level information from the EML. The profile is cached locally on the DeX server after the first download, thereby improving performance with subsequent views.

The DeX Subset page allows you to filter and subset on various modes, including a tabular query operation that uses the NumExpr query language for fine-level filtering. In this case, filtering uses attribute-level information from the EML metadata. For this reason, it may show unexpected results if the EML and data table do not match perfectly (what a great way to check your pre-publication data package table when viewing proofs through the staging Data Portal). Other modes are “Filter by time period,” “Filter by row index,” and “Filter by category.” The subset operation allows you to combine the result from all filter modes into a single modified data table that you can download to your local computer. The downloaded zip file contains the new data table, a new EML metadata document describing the data table, and a JSON file containing the filter criteria used to create the data table.

The DeX Plot page is a simple X/Y plotting application that provides you with a good perspective of trends within the data. You may select a single independent variable (including datetime values) along with one or more dependent variables. Large data tables (those greater than 10,000 records) are subsampled to provide better viewing performance. The dynamic plots let you zoom in/out or move around within the plot viewport. You may also save the plot to your local computer as a PNG image.

As the hyperlink text states, DeX is still experimental and may not respond as you expect. We are asking for feedback and reports of any issues with it. And please let us know if you have any exciting use case scenarios in which you can apply DeX.