Skip to content

Lineage

The Lineage Diff is the main interface to Recce and allows you to quickly see the potential area of impact from your dbt data modeling changes.

Lineage Diff

It's from the Lineage Diff that you will determine which models to investigate further; and also perform the various data validation checks that will serve as proof-of-correctness of your work.

Recce Lineage Diff

Lineage Diff

Node Summary

Models are color-coded to indicate their status:

  • Added models are green.
  • Removed models are red.
  • Modified models are orange.

The two icons at the bottom right of each node indicate if a row count or schema change has been detected. Grayed out icons indicate no change.

Model with Schema Change detected

Model with Schema Change detected

Note: A row count changed icon is only shown if there is row count diff executed on this node.

Open node details panel

Open the node details panel

Click a model to open the node details panel and perform other data validation checks.

Filter Nodes

In the top control bar, you can change the rule to filter the nodes:

  1. Mode:
    • Changed Models: Modified nodes and their downstream + 1st degree of their parents.
    • All: Show all nodes.
  2. Package: Filter by dbt package names.
  3. Select: Select nodes by node selection.
  4. Exclude: Exclude nodes by node selection.

Select Nodes

Click a node to select it, or click the Select nodes button at the top-right corner to select multiple nodes for further operations. For detail, see the Multi Nodes Selections section

Row Count Diff

A row count diff can be performed on nodes selected using the select and exclude options:

After selecting nodes, run the row count diff by:

  1. Clicking the 3 dots (...) button at the top-right corner.
  2. Clicking Row Count Diff by Selector.

Node Details

The node details panel shows information about a node, such as node type, schema and row count changes, and allows you to perform diffs on the node using the options accessed via the Explore Change button.

Schema Diff

Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage Diff to open the node details and view the Schema Diff.

Note

Schema Diff requires catalog.json in both environments.

Recce Schema Diff

Schema Diff

Recce Schema Diff

Schema Diff showing renamed column

Row Count Diff

Row Count Diff shows the difference in row count between the base and current environments.

  1. Click the model in the Lineage DAG.
  2. Click the Explore Change button in the node details panel.
  3. Click Row Count Diff.

Recce Row Count Diff - Single model

Row Count Diff - Single model

Code Diff

Code Diff shows the model code that has changed for a particular model.

Code diff

  1. Click the model in the Lineage DAG.
  2. Click the Explore Change button in the node details panel.
  3. Click Code Diff.

Value Diff

Value Diff shows the matched count and percentage for each column in the table. It uses the primary key(s) to uniquely identify the records between the model in both environments.

The primary key is automatically inferred by the first column with the unique test. If no primary key is detected at least one column is required to be specified as the primary key.

Recce Value Diff

Value Diff
  • Added: Newly added PKs.
  • Removed: Removed PKs.
  • Matched: For a column, the count of matched value of common PKs.
  • Matched %: For a column, the ratio of matched over common PKs.

Note

Value Diff uses the compare_column_values from audit-helper. To use Value Diff, ensure that audit-helper is installed in your project.

packages:
  - package: dbt-labs/audit_helper
    version: <version>

View mismatched values at the row level by clicking the show mismatched values option on a column name:

Profile Diff

Profile Diff compares the basic statistic (e.g. count, distinct count, min, max, average) for each column in models between two environments.

  1. Select the model from the Lineage DAG.
  2. Click the Expore Change button.
  3. Click Profile Diff.

Recce Profile Diff

Profile Diff

Please refer to the dbt-profiler documentation for the definitions of profiling stats.

Note

Profile diff uses the get_profile from dbt-profiler. To use Profile Diff, ensure that dbt-profiler is installed in your project.

packages:
  - package: data-mie/dbt_profiler
    version: <version>

Histogram Diff

Histogram Diff compares the distribution of a numeric column in an overlay histogram chart.

Recce Histogram Diff

Histogram Diff

A Histogram Diff can be generated in two ways.

Via the Explore Change button menu:

  1. Select the model from the Lineage DAG.
  2. Click the Explore Change button.
  3. Click Histogram Diff.
  4. Select a column to diff.
  5. Click Execute.

Via the column options menu:

  1. Select the model from the Lineage DAG.
  2. Hover over the column in the Node Details panel.
  3. Click the vertical 3 dots ...
  4. Click Histogram Diff.

Generate a Recce Histogram Diff

Generate a Recce Histogram Diff from the column options

Top-K Diff

Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.

Recce Top-K Diff

Recce Top-K Diff

A Top-K Diff can be generated in two ways.

Via the Explore Change button menu:

  1. Select the model from the Lineage DAG.
  2. Click the Explore Change button.
  3. Click Top-K Diff.
  4. Select a column to diff.
  5. Click Execute.

Via the column options menu:

  1. Select the model from the Lineage DAG.
  2. Hover over the column in the Node Details panel.
  3. Click the vertical 3 dots ...
  4. Click Top-K Diff.

Generate a Recce Top-K Diff

Generate a Recce Top-K Diff

Multi Node Selection

Multiple nodes can be selected to perform diffs on all nodes as part of a single check.

Select Nodes

  1. Click the Select nodes button.
  2. Select one or more nodes.
  3. Alternatively, right-click on a node and click Select parent nodes or Select child nodes.
  4. Click the action to perform in the control bar.

Row Count Diff

An example of a multi-node Row Count Diff showing the row count difference between the base and current environments.

Recce Row Count Diff - Multiple models

Row Count Diff - Multiple model

Value Diff

An example of a multi-node Value Diff showing the matched percentage difference between the base and current environments.

Recce Value Diff - Multiple models

Screenshot

In the diff result, we can find a Copy to Clipboard button. it's a handy feature to copy the result image to clipboard and paste in your PR comment.

Copy a diff screenshot to the clipboard - Multiple models

Copy a diff result screenshot to the clipboard and paste to GitHub

Note

FireFox does not support to copy image to clipboard. Recce show a modal instead. You can download the image to local or right-click on the image to copy the image.

Add to Checklist

In the lineage page, we can run different type of check. However, for these reason we would like to add to checklist

  1. Keep the check and I can rerun this after my code change
  2. Add my result and interpretation for review purpose

Lineage Diff

Lineage diff by selector

  1. Select nodes by Select and Exclude on the top control.
  2. Click ... at the top-right corner
  3. Click the Lineage diff

Lineage diff by multi nodes selection

  1. Click Select nodes button at the top-right corner
  2. Select nodes
  3. Click the Add lineage diff check button

Schema Diff

Schema diff by node selector

  1. Select nodes by Select and Exclude on the top control.
  2. Click ... at the top-right corner
  3. Click the Schema diff button

Schema diff by multi nodes selection

  1. Click Select nodes button at the top-right corner
  2. Select nodes
  3. Click the Add schema check button

Schema diff for single node

  1. Select a node, then the node detail would show.
  2. Click Add check button on the node detail pane.
  3. Click Schema check

Row Count Diff

Row count diff by node selector

  1. Select nodes by Select and Exclude on the top control.
  2. Click ... at the top-right corner
  3. Click the Row Count Diff by Selctor, then it will run the row count diff
  4. Click the Add to checklist in the result page.

Row count diff by multi nodes selection

  1. Click Select nodes button
  2. Select nodes
  3. Click Row count diff, then it will run the row count diff
  4. Select a node, then the run result would show.
  5. Click Add to checklist

Other Diffs

  1. Execute the diff
  2. Click Add to checklist