Atelier

active

Agentic classification workbench for column-level data governance.

Atelier is an agentic classification workbench for column-level data governance. It deploys as an application on Cloudera AI, fronting a gRPC core with a React canvas where analysts steer classification across thousands of columns at once.

The pipeline combines six evidence sources — embedding similarity, gradient-boosted prediction, regex pattern detection, column-name matching, short-text SVM, and an LLM convergence agent — through Dempster–Shafer belief fusion rather than flat score averaging. Each source contributes a mass function over a restricted frame of discernment derived from the ontology; combination yields a belief interval at every node of the hierarchy. The width of that interval is epistemic uncertainty made legible — the gap between what the evidence commits to and what it merely fails to contradict.

The interactive surface uses that uncertainty directly. A convergence agent (built on the Claude Agent SDK) targets the columns where the belief gap is widest, gathers additional context, and proposes a reclassification. Embeddings are explored on a 2-D atlas — a derivative of Apple’s embedding-atlas modified for Atelier’s workflow — so analysts can navigate clusters of mass before committing labels.

Atelier classifies into the shared SKOS vocabulary published by sdg-corpora, and is the short-iteration counterpart to Aegir: Atelier operates per-column with agent-assisted convergence; Aegir learns the cross-table structure that Atelier’s annotations expose.