About
SemanticGaps
Named for the semantic gaps between what institutions publish and what their language actually reveals. Building the Semantic Terrain category for institutional corpus intelligence.
SemanticGaps builds and maintains CAINC, an instrument for taking topological readings of large institutional document corpora. Those readings are rendered as Semantic Terrains: computed, navigable maps that reveal the structural shape of how federal agencies, regulatory bodies, courts, and other institutions communicate.
Our first public reading is the DOJ Semantic Terrain, a free and open map of 115,000+ Department of Justice press releases. It is available to anyone, with no account required. It exists so that anyone can see what a Semantic Terrain is and what corpus intelligence looks like in practice.
We serve government analysts, investigative journalists, white-collar defense attorneys, compliance officers, institutional investors, and M&A due diligence teams. Our focus is on domains where the stakes are high and the document bodies are large: government, law, news, finance, medicine, and defense.
Founded in Austin, TX. Operating worldwide.
What we do
We build topological pipelines that ingest institutional corpora and compute their structural geography. Topics cluster into peaks. Boundaries between them form contours. Entities are located precisely within the terrain. The output is a living reading that updates as the corpus grows.
Access comes through Terrain Subscriptions: continuous access to a named institutional corpus, updated on a defined cadence. We also build Custom Terrains for organizations that need a specific body of documents mapped, and offer Deep Dives for scoped analytical engagements using terrain data.
Each terrain we build compounds in value over time. The corpus grows. The terrain captures its movement. The longer it runs, the richer the structural record becomes. This is not a report that goes stale. It is a reading that gets sharper as the corpus deepens.
Approach
Topology
We compute topological structure over high-dimensional embedding spaces to identify the stable features of how institutional language organizes. Peaks, passes, and contours emerge from the mathematics, not from analyst intuition or manual tagging.
Semantics
Domain corpora are encoded geometrically. Regulatory language becomes a manifold. Enforcement patterns become terrain features. The meaning of the corpus is preserved in its structure, not reduced to keywords or summaries.
Production systems
Semantic Terrains run as production pipelines. Continuous ingestion, automated topological computation, and terrain updates on a defined cadence. Each terrain compounds in value as the live corpus grows.