semantic-layer dbt

Semantic Layer vs dbt Metrics: Choosing the Right Tool for Your Stack

Wei Tan · December 8, 2025

The question comes up constantly in data engineering Slack channels and conference hallways: "We're on dbt — do we also need a semantic layer?" The short answer is: it depends on what you mean by both terms, and the longer answer requires actually understanding what dbt's metrics layer does and doesn't do.

This post is for teams evaluating dbt Semantic Layer (powered by MetricFlow) against standalone semantic layer tools like Cube.dev or AtScale. We'll try to be precise about what each actually provides, where they overlap, and which combination (or single choice) makes sense for different stack configurations.

What dbt Metrics Layer Actually Does

MetricFlow — the engine behind dbt's Semantic Layer — lets you define metrics in YAML alongside your dbt models. A simple metric definition looks like this:

metrics:
  - name: monthly_active_users
    type: simple
    label: Monthly Active Users
    type_params:
      measure:
        name: active_users
        agg: count_distinct
    filter: |
      {{ Dimension('user__is_active') }} = true
    time_spine_table_configuration:
      location: ref('mau_time_spine')
      column_name: date_day

Once defined, this metric can be queried via the dbt Semantic Layer API — which means a BI tool connected to dbt (Hex, Mode, Metabase via the SL integration, or the dbt Cloud UI) can query it without writing raw SQL. The metric definition is the source of truth. Change it in the YAML, and every connected tool gets the updated definition.

This is genuinely powerful for teams that are already heavily invested in dbt. Your metric definitions live in the same repo as your models, follow the same git workflow, get the same CI checks. The developer experience is consistent.

Where MetricFlow Has Boundaries

MetricFlow's boundaries become visible when you push outside the dbt-centric workflow. Three areas where teams typically hit friction:

Cross-warehouse federation. MetricFlow is designed to push queries down to a single underlying data platform. If your data is split across Snowflake and BigQuery — a common pattern when different business units have different warehouse contracts — you can't define a single metric that joins data from both. You'd need to centralize the data first (more pipeline work) or define separate metrics per warehouse and reconcile at the BI layer.

Non-dbt sources. If part of your semantic model needs to reference data that isn't managed by dbt — a legacy Redshift schema, a DuckDB file for a specific analytics workload, data from an Airbyte destination that you haven't brought into dbt yet — MetricFlow can't directly include it. Your semantic model is bounded by what's in dbt.

Runtime query-layer control. MetricFlow translates metric queries to SQL that runs against your warehouse. It's a compile-time translation. Standalone semantic layers like Cube.dev have a server process that can apply query caching, pre-aggregation rules, access control, and row-level security at query time. For teams with heavy BI workload or strict row-level security requirements, this runtime layer matters.

What a Standalone Semantic Layer Adds

Tools like Cube.dev, AtScale, and (in a different architectural posture) LookML are semantic layers that sit in front of the warehouse as a persistent query service. They receive queries from BI tools and applications, translate them to optimized SQL, and apply caching, pre-aggregations, and access controls before hitting the warehouse.

The value proposition is clear for high-concurrency BI environments. If you have 50 analysts hammering dashboards simultaneously, a standalone semantic layer with pre-aggregation can serve most queries from cache rather than spinning up warehouse compute for each one. At Snowflake rates of roughly $2-4 per credit depending on tier, this is real cost avoidance at scale.

The tradeoff is operational overhead and potential lock-in. Running Cube.dev in production means running another service, managing its deployment, and learning its data model DSL (which is proprietary). LookML is deeply integrated with Looker/Google, so it's effectively an enterprise Looker add-on more than a standalone choice. AtScale is enterprise-oriented with pricing to match.

The Complementary Pattern: dbt + Semantic Layer

Many teams end up with both: dbt Semantic Layer for metric definition and developer experience, plus a standalone query layer for runtime optimization and cross-warehouse queries. This is especially common at mid-size analytics teams with 10-30 dbt models but heterogeneous data sources and multi-BI-tool environments.

The practical configuration looks like this: dbt MetricFlow owns metric definitions and pushes them to a central registry. Cube.dev (or an equivalent) subscribes to that registry, generates pre-aggregations for high-frequency metrics, and serves cached responses to Hex, Sigma, and direct API consumers. The warehouse (Snowflake or BigQuery) only receives the queries that miss the cache — typically ad-hoc exploration and new metrics that haven't been pre-aggregated yet.

This pattern works but introduces synchronization complexity. If a metric definition changes in dbt, the cache in Cube.dev needs to be invalidated. If the cache is stale, BI tools serve outdated data. You need a refresh trigger — either time-based (rebuild cache every N hours) or event-based (trigger on dbt Cloud job completion). Neither is fully automated out of the box; you wire it together.

A Concrete Decision Framework

Based on what analytics teams actually encounter, here's a rough decision matrix:

Use dbt Semantic Layer (MetricFlow) as your primary semantic layer if: Your entire data stack is dbt-managed, you're on a single warehouse, your BI tools support the dbt SL integration, and your query concurrency is moderate (under ~30 concurrent dashboard queries).

Add a standalone semantic layer if: You have heterogeneous data sources not all in dbt, you need cross-warehouse federation, your BI concurrency is high and warehouse costs are significant, or you need runtime access control (row-level security, column masking) that MetricFlow doesn't natively handle.

Use only a standalone semantic layer if: You're not on dbt, you need an SDK-first integration for application embedding, or your team is building a product that exposes metrics to end users via API.

We're not saying dbt Semantic Layer is insufficient for serious use — for a dbt-centric shop on a single cloud warehouse, MetricFlow is a well-designed solution and the developer experience is genuinely good. We're saying that treating it as equivalent to a full standalone semantic layer leads to architectural surprises when you hit the cross-warehouse or high-concurrency scenarios it wasn't designed for.

Where Agentic Routing Fits In

Neither MetricFlow nor Cube.dev actively monitors your upstream schema and updates metric definitions when the underlying sources change. They're both passive: you define metrics, they serve them. When events.user_id gets renamed to events.account_id in your source, your MetricFlow metric silently breaks, and you find out on the next dbt run.

The agentic layer is orthogonal to both. It watches the schemas that your semantic model depends on, detects structural changes, and either automatically updates the metric definitions or surfaces the required changes for human review. It doesn't replace dbt's metrics layer or a standalone semantic layer — it makes either one more resilient to the upstream schema churn that every real data team deals with.

The combination that handles all three concerns — metric governance (dbt SL), query optimization (standalone SL), and upstream drift detection (agentic routing) — is what a mature data platform looks like for teams with heterogeneous stacks. It's not over-engineered if your business depends on metric consistency across multiple BI tools, warehouses, and a source schema that doesn't stay still.