company semantic-layer agentic

Why We Built Loomkindle: The Story Behind the Agentic Semantic Layer

Wei Tan · September 15, 2025

I spent three years at a logistics analytics company before founding Loomkindle. The stack was reasonable by 2021 standards: Snowflake as the warehouse, Fivetran handling ingestion from a dozen operational sources, dbt managing most of the transform layer. We had around 400 dbt models, a handful of Airflow DAGs, and Looker sitting on top for dashboards. On paper, it worked.

In practice, it was held together with duct tape and institutional memory.

The Metric Duplication Problem

Here's a specific story. We had a metric called "active shipper" — the count of unique shipping accounts that completed at least one order in the last 30 days. Sounds simple. We had four separate definitions of it: one in a dbt model, one hard-coded in a Looker LookML view, one in a Python script feeding a Slack report, and one in a legacy Redshift query that someone had copy-pasted into a Jupyter notebook. All four returned slightly different numbers depending on how they handled cancelled orders, test accounts, and timezone normalization.

Every Monday morning standup, someone would ask why the "active shipper" number in the executive dashboard didn't match the number in the weekly email. We'd spend 20 minutes tracing it back. We fixed it once, thoroughly, in Q2 2022 — new canonical dbt model, deprecated the others. By Q4 it had drifted again because a new hire had built a new Airflow task without knowing the canonical model existed.

This is not a tooling failure. This is a coordination failure that tooling enables or prevents. The tools we had were good at storing and transforming data. None of them was opinionated about enforcing a single definition at the semantic layer.

Schema Drift as an On-Call Tax

The second pain point was schema drift. Our upstream source — a third-party order management system — would silently rename or restructure columns two or three times a year. No deprecation notice, no migration guide. We'd find out when a dbt model failed at 3am because order_line_items.shipper_ref had been renamed to order_line_items.account_ref in the source schema.

At first, we handled it with dbt source freshness checks and column-level assertions in Great Expectations. That caught about 70% of breakages before they hit production. The other 30% slipped through because the column existed — it just had different semantics now. A NOT NULL check doesn't tell you that the column now contains account identifiers instead of shipper codes.

We wrote a Python script to diff the schema against a stored baseline on every Fivetran sync. If a column was renamed or removed, it would post to a Slack channel. That worked until the Fivetran API changed its response format. The script broke silently. The drift went undetected for six days before someone noticed the MoM growth chart was flatlined.

We're not saying Fivetran or Great Expectations is a bad choice — they're good tools that solved the problems they were built for. We're saying that detecting schema drift in the raw ingestion layer doesn't protect you from semantic divergence in the transform layer. Those are different problems at different altitudes.

What We Actually Needed

By mid-2022, I had a clearer picture of what was missing. It wasn't another orchestrator — Airflow and Prefect were doing fine at scheduling. It wasn't another ingestion tool. It was something that lived between the warehouse and the consumer layer, that understood the meaning of your data (not just its structure), and that could actively monitor and adapt when upstream changes broke downstream semantics.

The closest thing at the time was Cube.dev, which had the right instinct about a semantic layer sitting above the warehouse. But it was primarily a query layer — it didn't watch your upstream schema, it didn't route transforms, and it had its own query DSL that added lock-in. LookML in Looker had a similar tradeoff: powerful semantic modeling, but entirely within Looker's ecosystem.

What I wanted was something YAML-first (no proprietary DSL), warehouse-agnostic, capable of detecting upstream changes and rerouting transforms without human intervention, and integrable with the rest of the modern data stack — dbt, Airflow, Hex, whatever you were already using.

Seattle, 2022, Three People and a Whiteboard

I left the logistics company in October 2022 and started building. The initial prototype was embarrassingly simple: a Python library that parsed a YAML metric definition file, compared it against a stored schema snapshot, and emitted a warning if the upstream source had diverged. No UI, no agent, no fancy routing. Just schema comparison and alerts.

It took about six weeks to build a version I was willing to share with other data engineers. The feedback was consistent: the detection part was useful, but what they really wanted was for the tool to do something about the drift — not just alert them, but reroute the affected transforms to a safe fallback or quarantine the bad data so downstream models didn't silently corrupt.

That feedback is what pushed Loomkindle toward the agentic routing model. The agent isn't a chatbot. It's a decision layer that sits between your schema and your transform DAG, monitors for changes, and reroutes based on rules you define. We call it "agentic" because the decisions are made autonomously — you define the policy, the agent executes it without requiring manual intervention at 3am.

The Angel Round and What It Bought

We closed a $275K angel round in June 2025 — three years after the first commit. That gap is intentional context: we spent those years actually building the product, not pitching. We had paying design partners before we took external money. The angel check went toward infrastructure costs, one additional engineer, and getting the product to a state where we could open early access to a broader set of analytics teams.

The name "Loomkindle" comes from two ideas: a loom weaves threads into structure (how we think about routing semantic relationships through a data graph), and to kindle means to start something from a small spark. The founding thesis is that the modern data stack has great individual tools but a missing coordination layer. Loomkindle is that layer.

We're a small team in Seattle's South Lake Union neighborhood, a few blocks from where AWS has its offices. That proximity isn't accidental — Snowflake and Redshift run on AWS, and the Pacific Northwest data engineering community is one of the most active outside New York and San Francisco. Being here means we run into the practitioners who hit these problems daily.

What We're Building Toward

The early access product has four core capabilities: semantic model definition in YAML, multi-warehouse query pushdown (Snowflake, BigQuery, Redshift, DuckDB), agentic schema drift detection and transform rerouting, and column-level lineage tracking. The agent routing layer is the part no other tool has tried to tackle directly.

We're not trying to replace dbt. We're not trying to replace your orchestrator. The positioning is deliberate: Loomkindle sits at the semantic coordination layer — the place where metric definitions live, where transform routing decisions get made, and where schema changes get caught before they corrupt the data your analysts actually use.

If you're a data engineer who's ever spent a Monday morning figuring out why two dashboards show different numbers for the same metric, this tool was built for that exact morning.