What ‘ARR’ Actually Means in Your Warehouse: A Case for Canonical Metric Definitions
Ask the finance team what ARR means. Then ask the product team. Then ask the data team. If you're lucky, you get three answers that are defensibly consistent. More often, you get three definitions that agree on the word and diverge on almost every implementation detail that determines the number. This is not a communication failure. It's a systems failure. And the symptom — a metric with a shared name but no shared definition — is one of the most expensive silent bugs in a SaaS analytics stack.
Why "ARR" is a description, not a definition
Annual Recurring Revenue is a conceptually simple metric: the annualized value of active subscription contracts at a point in time. The concept is clear. The implementation questions start immediately.
Which contracts count? Only billed subscriptions, or also signed-not-yet-billed contracts? Does a subscription in a free trial count? At what point in the trial — when the trial starts, when billing info is entered, or when the first charge clears? What about contracts with variable pricing — does ARR reflect the committed minimum, the current actuals, or the projected run rate? How do you treat annual contracts that were paid upfront but are recognized monthly — at the full contract value or at the monthly recognized amount multiplied by 12?
Each of these is a legitimate business decision. Different companies answer them differently based on investor expectations, billing system capabilities, and accounting policies. None of these answers is wrong. But every team in your company needs to be using the same answers, applied consistently, to the same underlying data.
The anatomy of ARR divergence in a typical SaaS warehouse
Here is what ARR divergence looks like in a real analytics environment:
Finance's ARR comes from the billing system export to Salesforce to a nightly spreadsheet macro that a finance analyst runs and checks manually against the ERP reconciliation. It includes signed contracts regardless of billing status, uses contract value for variable-pricing accounts, and excludes month-to-month customers because the finance team treats those as non-recurring.
Product's ARR comes from a dbt model that reads the subscription events table in the application database. It counts all accounts with an active subscription flag as of the report date, annualizes the monthly charge, and includes month-to-month accounts because the product team cares about total revenue from active users, not just committed ARR.
The data team's ARR comes from a Looker explore that was built 18 months ago to reconcile the above two, which it does approximately but not exactly because the reconciliation logic was written to match a particular period and has not been updated to reflect two pricing model changes since then.
The result: three ARR numbers that differ by 3-8% from each other on any given day, with the gap varying based on the composition of new signups that month. Finance and product have learned to expect the discrepancy. Neither team trusts the other's number. Neither team trusts the data team's reconciled number because they know the reconciliation logic is stale. All three teams maintain their own spreadsheet-based adjustments for board reporting.
The cost of three ARR definitions
Beyond the obvious audit risk — your board and your investors should be seeing a single, consistently-defined ARR — there are operational costs to metric fragmentation that accumulate quietly:
- Product decisions made on a different ARR base than finance uses for headcount planning lead to resource allocation mismatches that only become visible at annual planning
- Cohort analysis that mixes ARR definitions produces segment-level growth rates that can't be reconciled to the total — a particular problem when the CEO asks "which customer segment is driving growth?"
- Investor updates that use different ARR numbers in different slides create credibility problems that are disproportionately expensive to fix after they're noticed
- Engineering roadmap prioritization based on ARR-per-feature attribution requires a consistent ARR definition or the attribution is meaningless
What a canonical metric definition actually contains
A canonical ARR definition is not a paragraph in a Notion page. It's a machine-readable specification that every team's analytics tools reference at query time, with the following components:
| Component | What it specifies | Example for ARR |
|---|---|---|
| Business definition | Plain-language description of what the metric means | "Annualized value of all active billed subscriptions as of the report date, excluding trial accounts" |
| SQL derivation | The exact query that produces the metric from warehouse tables | SUM of monthly_charge * 12 WHERE subscription_status = 'active' AND trial_end_date IS NOT NULL AND trial_end_date < report_date |
| Grain | The unit of the metric (account-level, contract-level, etc.) | Account-level, point-in-time as of report_date |
| Source lineage | Which upstream tables and columns feed the metric | subscriptions.monthly_charge, subscriptions.status, subscriptions.trial_end_date |
| Owner | The team and person responsible for maintaining the definition | Finance team, CFO sign-off |
| Change history | When the definition changed and why | v2: 2024-09-12, added trial exclusion after pricing model change |
The single ARR node principle
The practical goal is having exactly one place in your analytics stack where ARR is computed. One SQL derivation. One set of filter conditions. One owner. Every BI tool, every dashboard, every ad-hoc query that needs ARR fetches it from that single canonical definition — they don't each re-implement the calculation.
This is the principle behind semantic layer tools and metric catalogs. When every BI query resolves "ARR" to the same SQL fragment, you eliminate the possibility of metric fragmentation at query time. Looker's LookML, dbt metrics, and dedicated metric catalog APIs all offer versions of this pattern.
The goal isn't to make ARR simple — it isn't simple, and pretending otherwise produces wrong numbers. The goal is to make the complexity explicit, owned, and applied consistently. One canonical node that encodes all the business rules, with a version history and a named owner.
Getting from fragmentation to canonicalization
The path from three ARR definitions to one canonical definition requires a structured conversation that is more political than technical. The data team's role is to facilitate the conversation, not to pick the "correct" definition. The correct definition is the one that finance, product, and the CEO agree on — the data team's job is to encode that agreement in a form that can be enforced at query time.
Start with a definitions audit: document every place in your warehouse and BI tools where ARR is computed. For each computation, document the exact filter conditions and business rules it applies. Present the differences to the metric owners (finance lead + product lead) with specific examples of when the definitions diverge and by how much. Let them negotiate the canonical definition. Then encode it once, with an owner and a version, and deprecate the others.
This is unglamorous work. It takes one to three weeks for most mid-market analytics teams. It unblocks months of confusion downstream. And once done, the canonical metric node is self-evidently the right way to maintain any other metric your team cares about — ARR is just the first one that's worth fighting for.