Back to Blog
ArticleJune 9, 20266 min

Data Lineage Is a Vanity Metric Without Business Context

Most lineage tools produce beautiful diagrams that don't answer the one question that matters: 'What breaks if this data is wrong?' Here's how to move from observability theater to business-critical lineage.

Data Lineage Is a Vanity Metric Without Business Context

By Andrew Tan


Dashboards that lie

Many companies spend north of six figures on data lineage tools. Their demos are impressive: sprawling visualizations showing every table, pipeline, and dependency across a data warehouse. Colors indicate freshness. Arrows show data flow. It looks like the control room of a nuclear power plant.

All of this is great and fancy, but one of the unanswered questions is what happens when table X has bad data.

You can click around the diagrams, zoom and pan, locate the table, inspect the downstream consumers and transformations it fed into. And then you can tell that twelve dashboards use 'customer address'."

The real question, though, is which business processes break. Does shipping stop? Do invoices go to the wrong place? Do compliance reports fail? You get the idea.

The dashboard instead knows that data flowed from A to B, but it had no idea what B was actually for.


Lineage theater

This is what I call lineage theater: the practice of building impressive-looking data flow diagrams that satisfy compliance checklists and vendor demos but don't actually help when things break.

The tooling vendors have optimized for the wrong thing. They're selling visualizations. What data teams need is context: the ability to trace a data quality issue to its business impact in under 60 seconds.

You can see this pattern across many companies. They implement lineage tools with great fanfare. The diagrams go up on office TVs (cool), and the data governance team writes documentation about the documentation. Then, six months later, an upstream system changes a column name and the lineage diagram lights up like a Christmas tree while the actual business impact remains a mystery.

The team ends up doing what they'd have done without the tool: paging through Slack, checking with stakeholders, manually tracing which reports matter for which decisions.


The business context gap

Here's the fundamental problem: technical lineage and business lineage are different things, and most tools only do the first one.

Technical lineage answers: Where did this data come from and where does it go?

Business lineage answers: What decisions depend on this data, and what happens if it's wrong?

The gap between them is where data disasters happen. A pipeline can be 100% correct from a technical standpoint: all jobs green, all tests passing: while producing output that's catastrophically wrong for the business.

Let's say you are a fintech company, and your loan approval model is technically perfect. The lineage shows clean data from application through feature engineering to model scoring. What the lineage doesn't capture is that a recent schema change had swapped two similarly named fields, "annual_income" and "monthly_income", in a way that the pipeline's validation rules didn't catch.

The model now treats monthly income as annual income. Approval thresholds that should have required $60,000/year are triggering on $5,000/month. The lineage diagram shows green arrows. The business outcome is a month of bad loans that take six months to unwind.


What useful lineage actually looks like

The teams that do lineage well have one thing in common: they treat it as a business mapping exercise, not a technical documentation task.

You need to takes a different approach: Every data asset in your warehouse has three tags:

  1. Criticality: Is this used for regulatory reporting, operational decisions, or analytics only?
  2. Downstream processes: Which business functions depend on this? (Not which tables, but which functions: billing, clinical decisions, compliance)
  3. Error impact: What happens if this data is wrong? (Delay, financial loss, regulatory issue, patient safety)

The resulting lineage tool is technically simple: just a basic dependency tracker. But combined with those three tags, it tells exactly what you need to know when something breaks.

When your claims processing table has a data quality issue, you don't need to trace through fifteen downstream tables. You look at the tags, see "Criticality: Regulatory, Downstream: Monthly CMS filing, Error impact: $2M penalty if late," and knew immediately to escalate to the CFO and initiate the manual filing backup process.

The entire incident response takes minutes. No diagram navigation required.


Why we build the wrong thing

So why do teams keep buying visualization-heavy lineage tools that don't solve the real problem?

Part of it is procurement theater. The person buying the tool often isn't the person debugging the 2 AM incident. They're buying something that looks thorough for the compliance audit or the board presentation. Beautiful diagrams check boxes. Business context mapping requires organizational work that doesn't photograph well.

Part of it is the nature of how these tools are sold. Vendors demo with clean, synthetic data environments where the lineage is obvious. Real enterprise data environments are super messy: decades of legacy systems, undocumented transformations, tribal knowledge that's never been written down. Mapping business context requires talking to people, not just scanning code. It doesn't scale as cleanly as automated technical discovery.

And part of it is that technical lineage is easier to build. You can scan query logs, parse SQL, inspect DAGs. Business context requires interviews, documentation, ongoing maintenance as processes change. It's organizational work disguised as technical work.


How to fix your lineage

If you're already invested in a lineage tool (and most companies are at this point), you don't need to rip it out. You need to add business context to it.

Start with your incident history. Look at the last five data quality incidents that caused real business impact. For each one, identify:

  • What data was wrong
  • What business process broke
  • Who needed to know
  • How long it took to figure that out

Now go look at your lineage tool. Does it help with any of those questions? If not, you have your improvement roadmap.

Tag critical assets manually. Don't try to tag everything. Start with your top 20 data assets by business impact. For each one, document: what decisions it feeds, who owns those decisions, and what happens if the data is bad.

This takes time: maybe 30 minutes per asset; maybe more. But it turns your lineage from a pretty diagram into an operational tool.

Build business-aware alerting. Most data quality alerts are technical. "This job failed" or "this column has nulls." Add business-aware alerts: "The daily revenue summary has suspicious values, which feeds the CEO dashboard at 8 AM."

The alert should include not just what's wrong, but what depends on it and who needs to know.

Practice incident response. Run a tabletop exercise. Simulate a data quality issue in a critical upstream system. Time how long it takes to answer: which business decisions are affected, who needs to be notified, and what the mitigation options are.

If it takes more than five minutes, your lineage needs more business context.


The product I wish existed

I've looked at some of the lineage tools on the market. They're all variations on the same theme: scan your infrastructure, build a graph, show you pretty visualizations.

What I want is different. I want a tool that starts with business processes and works backwards. Map the decisions first, then trace to the data that feeds them. When something breaks, tell me which decisions are at risk, not just which tables are affected.

But you don't need a new platform to get better lineage. You need to stop treating lineage as a technical problem and start treating it as an organizational one. The diagram isn't the product. The business context is.


The test for your lineage tool

Here's a simple test. Pick a critical data asset in your system: something that would be painful if it were wrong. Now answer these questions without looking at code:

  1. What business decisions depend on this data?
  2. Who makes those decisions, and when?
  3. What's the cost of being wrong?
  4. Who needs to know if there's a quality issue?

If you can't answer those questions in 60 seconds, your lineage tool isn't doing its job: no matter how beautiful the diagram looks.

The goal isn't perfect observability. It's usable context. And that's harder to build, but infinitely more valuable.


Andrew Tan is a serial entrepreneur and founder of layline.io, building enterprise data processing infrastructure that handles both batch and real-time workloads at scale.

Share:

Enjoyed this article?

Subscribe to get more insights delivered to your inbox.