Gauge: Domain-agnostic dataset exploration tool

Gauge is an information analysis and interpretation tool that works on bulk unsorted and unlabeled data to find patterns hidden 'between' data points. Gauge extends analyst capabilities by grouping similar data into clusters where their differences can be highlighted, and an analyst can develop an understanding of cause and effect within a local area.

Live Gauge


Gauge Applications
Image

HPC I/O experts used Gauge to explore the workload of the ALCF Theta supercomputer.

They selected four different climate simulation and one plasma simulation applications. SHAP analysis shows that unaligned file accesses have a strong negative impact on performance. They are able to address these performance issues by refactoring I/O access loops or using high-level library optimizations to restructure access patterns.

Image

Analysis at scale with explainable local models

Gauge clusters, models, and interprets both labeled and unlabeled data. It is an explainable ML platform for analysis to answer a number of questions: how can we cluster items together? Given an instance of item, what existing cluster does the instance fall into? What are the key characteristics of the cluster itself? Does it match the expected profile? What parameters influence the instance placement within the cluster?

Image

Taxonomy comprised of five categories of modeling errors

We introduce a taxonomy comprised of modeling errors in a system: poor system modeling, inadequate dataset coverage, inherent system contention, and measurement noise. We develop litmus tests to quantify each error category, allowing scientists to narrow down failure modes, enhance analysis models, improve system performance and associated logging and analysis tools.

Gauge Usage Model

Discovering Structure from Multi-Modal Data Sources

Analyst time is valuable – spending it analyzing individual events / objects can be wasteful. With Gauge's hierarchical and interactive view of a whole database, an analyst can work at the right granularity. Gauge creates a hierarchy through (i) domain expert-driven feature engineering, (ii) achine learning (ML) - based metric engineering, and (iii) carefully chosen hierarchical clustering algorithms.

Gauge uses ML models to extract features of interest where analyst can quickly narrow down source of certain behavior in the collected data.

Launch Gauge

Input Data

Gauge works on both labeled and unlabeled data. Its flow consists of data parsing, feature selection, sanitization, and normalization, clustering, ML model training, and results visualization.

Algorithms

HDBSCAN hierarchy plus cluster visualizations, and SHAP - a game theoretic approach that can explain the output of black box machine learning models for model interpretations

Analysis

Analysis allows for further feature engineering and clustering technique refinements. It highlights the dominant correlations and negative correlations.

Visualization

Gauge has web-based and interactive that allows for real-time iterative domain-expert driven learning and clasification.

Loading...