The Semantic Layer Is the Type System for Data
“If GroupBy plus Aggregate plus Top-K covers eighty percent of data analysis tasks, and LLMs already write SQL fluently, why isn’t there a working end-to-end auto-pipeline yet?”
It’s the right question. The answer reframed how I think about every “AI data analyst” product I’ve evaluated this year. Here it is: LLMs only solve syntax. All the semantics still has to be written down by humans somewhere, and that somewhere is the semantic layer. SQL is dynamic typing. The semantic layer is its type annotations. Without them, AI guesses. And silent guesses produce confidently-wrong dashboards.
The four-layer ambiguity model
Every “natural language to analytics” task has four sources of vagueness. LLMs solve one of them.
| Layer | Example | Can an LLM solve it? |
|---|---|---|
| Language ambiguity | ”find users with biggest growth” → SQL | Yes, roughly seventy to eighty percent |
| Data ambiguity | does user_id include deleted users? | No, it guesses |
| Metric ambiguity | ”growth”, absolute? percent? rate? rate-of-rate? | No, it guesses |
| Trust and audit | is the result actually correct? | No, it self-affirms |
Layers two through four are exactly what a semantic layer encodes. The question carried a hidden assumption: that language is the bottleneck. It isn’t. The bottleneck is organizational consensus on meaning, which language merely transmits.
The same animal under different names
“Make implicit assumptions explicit and machine-executable” is one of the deepest patterns in software engineering. It recurs everywhere good engineering exists, under different names but with the identical cognitive move:
- Type systems in programming languages
- OpenAPI and GraphQL schemas for HTTP APIs
- Domain-Driven Design in software architecture
- Ontologies in knowledge representation
- Data contracts in modern data governance
- Semantic layer in analytics, same animal, different layer of the stack
If you’ve ever felt the visceral relief of porting a sloppy Python codebase to TypeScript and watching the type checker flag five years of latent bugs in a single afternoon, you already know what a semantic layer feels like. You just haven’t seen it applied to data yet.
What’s actually in a semantic layer
It’s more than “definitions.” A working semantic layer encodes five things:
- Business concept definitions, what counts as a “user,” “active,” “revenue”
- Relationships,
user → orders → items, with cardinality - Computation rules, the exact DAU formula, the exact growth formula
- Constraints and access controls, who can query what, at what grain
- Lineage, where this number came from, traceable end-to-end
Look at any “AI data analyst” product on the market right now, Hex Magic, Mode AI, ThoughtSpot Sage, Julius, Numbers Station, and they all converge on the same architecture: LLM in front, semantic layer behind. The infrastructure being built underneath is dbt’s metrics layer (now MetricFlow), Cube, and LookML (which Looker pioneered before the Google acquisition). None of these products work well without a pre-built semantic layer for the target domain. That’s not a coincidence. That’s the shape of the problem.
A concrete failure mode
I was working through a hands-on data analysis problem last week. The shape: given a corpus of historical snapshots from an operational system, identify which entity in the system grew the most over the observation window.
Imagine handing this to a naive LLM with the raw data and the prompt. Here’s what would plausibly happen:
- It picks the wrong end of a path-like identifier field as the entity. Domain convention says “rightmost token wins”; string-indexing intuition says “leftmost.” Silent flip.
- It naively splits a composite identifier on an inner delimiter, fragmenting a token that the protocol spec defines as atomic. The decomposition looks reasonable; it’s wrong.
- It uses cumulative counts across the whole window, instead of “first-appearance vs last-appearance” comparison. Both are valid framings; only one is what the problem actually asks.
- It uses an absolute diff instead of a percentage growth. Both reasonable; one specified.
- It forgets the divide-by-zero edge case where the baseline count is zero.
Five silent errors. All from missing domain semantics. The query compiles. The query runs. The query returns a number. The number is wrong, and nothing in the pipeline tells you so.
A semantic layer for this domain would encode the rules upfront, which field is the entity identifier, which composite tokens are atomic, which growth formula applies, and how to treat the zero-baseline case. With those rules captured, an LLM can write correct code on top, because the ambiguity is resolved upstream of code generation, by humans who knew what they were doing.
The TypeScript moment for analytics
Programming spent fifty years learning a hard lesson: dynamic types are easy in the small and crushing in the large. That’s why TypeScript ate JavaScript (Stack Overflow’s 2024 developer survey ranks TypeScript among the most-admired languages, well ahead of JavaScript). It’s why Python type hints (PEP 484) went from controversial to ubiquitous. It’s why Rust’s borrow checker is a feature, not a bug.
Data is going through the same evolution, about a decade behind. SQL is dynamic. Semantic layers are the type annotations being bolted on now. The push from dbt, Cube, and MetricFlow is the TypeScript moment for analytics, same pain, same compromise, same eventual win.
The counter-arguments I take seriously
“LLMs will get smart enough to infer semantics from raw data plus docs.” No. Semantics aren’t in the data. They’re in the organization’s value judgments. “Active user” is not a property of the schema; it’s a decision made by a product manager. No amount of model scaling will let an LLM read a PM’s mind. The PM has to write it down somewhere. That somewhere is the semantic layer.
“Self-improving agents will build the semantic layer themselves.” Building a semantic layer is mostly organizational alignment work, five teams who define user differently, sitting in a room, agreeing on one definition. LLMs don’t have authority to resolve that disagreement. They can draft the YAML; humans must ratify it.
“Vibes-based dashboards are good enough.” True for exploratory analysis, internal demos, blog posts. Not true for anything load-bearing, financial reporting, compliance, A/B test conclusions, board-deck metrics. The eighty percent of automation will land in the low-stakes long tail. The high-stakes head stays human-supervised for the foreseeable future.
What this predicts
| Year | Auto-coverage | Where humans still required |
|---|---|---|
| 2026 (now) | ~30% | Data semantics, metric design, trust audit |
| +3 years | ~60% (with semantic layer) | Cross-domain analysis, new metric design |
| +5-10 years | ~80% (standard domains: e-com, SaaS) | Edge domains (specialized infrastructure, healthcare, legal), strategic value judgment |
| Forever | <100% | “Which metric matters” is value judgment, not calculation |
The remaining twenty percent stays human work because deciding what to measure is responsibility-bearing, not technical. A model can compute any metric you specify. It cannot tell you which metric will keep you out of a courtroom.
The takeaway in one sentence
The reason “AI does my data analysis end-to-end” is still aspirational, even though LLMs are good at SQL, is that the semantic layer is the missing infrastructure, and building it is mostly organizational consensus work that no amount of model scaling will replace.
If this lens is useful, I’m writing more of these as I work through the AI-meets-systems-engineering frontier from a graduate-school vantage point. Subscribe for the next issue, it’s on context degradation in long-running agents, with similar “implicit-becomes-explicit” failure modes.
Further reading: The dbt Semantic Layer documentation is the most concrete starting point if you want to build one. The Cube.js docs have the cleanest mental model of the four-layer split. Benn Stancil’s blog is the most thoughtful ongoing critique of the “metrics layer” promise versus delivery.