From Fragile SQL Macros to Declarative Pipelines: A Migration Playbook

From Fragile SQL Macros to Declarative Pipelines: A Migration Playbook

There's a particular type of analytics codebase that appears in nearly every company that hired its first analytics engineer before 2022: a collection of SQL macros, Jinja templates, and dbt models that accumulated organically over 2-3 years, are deeply interdependent in ways nobody fully mapped, and cannot be safely modified without extensive manual testing. The engineers who built it have mostly moved on. The engineers who inherited it have learned to work around it rather than through it. It is the analytics equivalent of legacy monolith code, and it needs to be migrated eventually — the question is how to do it without a 6-month big-bang rewrite that breaks production during the migration.

Diagnosing what you actually have

Before starting any migration, you need an honest inventory of the existing pipeline. Not the ideal version in your head — the actual code and the actual dependencies. This requires a three-part audit:

1. Execution dependency graph

Which models depend on which other models? In a well-maintained dbt project, this is visible in dbt docs. In the reality we usually encounter — a mix of dbt models, standalone Python scripts, Airflow tasks calling stored procedures, and cron jobs running SQL files — the dependency graph often doesn't exist as an artifact anywhere. Build it. Use the actual execution logs from your scheduler to trace which jobs must complete before which other jobs can start.

2. Business consumer map

Which downstream business systems, BI tools, and reports depend on which models? A model with zero downstream consumers can be migrated aggressively or deleted. A model that feeds the daily finance dashboard and the product team's growth report is high-risk and requires staged migration with a parallel-run period.

3. Business logic inventory

For each model that has active downstream consumers, document what business logic it encodes. Not a description of the SQL — the business decisions embedded in the SQL. "This model applies a 30-day lookback window to the activation event because the product team decided in 2022 that users who don't return within 30 days of signup should be counted as churned." That decision is not in the SQL comments. It's in someone's head or a forgotten Notion page. Surfacing it before the migration prevents you from inadvertently changing a business rule while refactoring the code.

Why big-bang migrations fail

The appeal of a big-bang rewrite is that it produces a clean, declarative codebase in one pass. The reality is that it takes 3-5x longer than estimated, spends 70% of its time on edge cases discovered mid-migration, and goes into production all at once — meaning every bug surfaces simultaneously in a go-live week that stresses everyone and breaks trust in the new system before it's had a chance to prove itself.

In our experience, teams that run big-bang analytics migrations have roughly a 50% success rate in terms of "the new system works as well as the old one within 3 months of go-live." The other 50% either roll back to the old system or run both systems in parallel indefinitely, which costs more than either system alone.

The correct migration strategy is lane-by-lane, not full-road. Identify one cohesive slice of the pipeline — one subject area, one set of source tables — and migrate it completely to the declarative model before touching the next slice. Each lane is a production system before you open the next one.

The lane-by-lane migration playbook

Here is the sequence we've seen work consistently for analytics pipeline migrations from SQL macros to declarative pipelines:

Phase 1: Freeze and document (1-2 weeks)

Stop adding features to the legacy pipeline. Document every model in scope for migration: business logic, downstream consumers, schema, and expected output for a sample of dates. This documentation becomes your acceptance test criteria. You're migrating correctly when the new pipeline produces the same output as the legacy pipeline for all sample dates.

Phase 2: Extract the first lane (2-3 weeks)

Pick the smallest cohesive unit that has value on its own — typically a single source-to-output chain for one business domain (e.g., subscription events → churn metrics). Write the declarative pipeline for this lane. Run it in parallel with the legacy pipeline, comparing outputs daily for two weeks. Differences require investigation: are they bugs in the new pipeline, improvements over the legacy logic, or cases where the legacy pipeline was wrong?

Phase 3: Promote the first lane and decommission the legacy equivalent (1 week)

When the parallel run shows consistent output agreement (or documented deliberate differences where the new logic is correct), promote the new pipeline to production for that lane. Update BI tool connections to point to the new output tables. Decommission the legacy code for that lane — don't archive it in a "legacy" folder, delete it. Archiving creates maintenance confusion; deletion forces commitment.

Phase 4: Repeat for remaining lanes

Apply the same pattern to each subsequent lane, building on the momentum from the first. Later lanes benefit from the declarative patterns established in the first lane, so migration speed typically increases by the third or fourth lane.

Handling the models that touch everything

Every legacy analytics codebase has a handful of "god models" — typically a dim_accounts or fct_events table that almost every other model joins to. These are the highest-risk migrations because they have the most downstream consumers and the most accumulated business logic.

The pattern that works for god models is: migrate all the leaf-level consumers first, so the god model is the last thing you touch. By the time you get to it, you have experience migrating every type of model it feeds, you understand the edge cases, and you have confidence from successfully migrating the rest of the pipeline. The god model migration still requires a careful parallel run — plan for 3-4 weeks rather than 2.

What you gain at the end of the migration

The value proposition for a lane-by-lane declarative migration is not primarily speed or cost during the migration — it's what you have on the other side. A declarative pipeline where:

  • Every transformation is expressed in YAML or SQL that any engineer can read and modify safely
  • Business logic is encoded in version-controlled, reviewable definitions rather than tribal knowledge
  • Schema changes trigger structured impact reviews rather than silent breakage
  • Every metric has an owner, a definition, and a freshness SLA
  • The compiled SQL is committed to git and auditable — no black-box transformations

That is not just a more maintainable codebase. It's a pipeline that can be maintained by an analytics team of two without requiring deep institutional knowledge from the person who wrote the original macros. At 50-500 employees, with limited data engineering bandwidth, that property is worth more than almost any feature the legacy pipeline had.