Lineage
The Lineage Diff is the main interface to Recce and allows you to quickly see the potential area of impact from your dbt data modeling changes.
Lineage Diff
It's from the Lineage Diff that you will determine which models to investigate further; and also perform the various data validation checks that will serve as proof-of-correctness of your work.
Node Summary
Models are color-coded to indicate their status:
Added
models are green.Removed
models are red.Modified
models are orange.
The two icons at the bottom right of each node indicate if a row count
or schema
change has been detected. Grayed out icons indicate no change.
Note: A row count changed icon is only shown if there is row count diff executed on this node.
Click a model to open the node details panel and perform other data validation checks.
Filter Nodes
In the top control bar, you can change the rule to filter the nodes:
- Mode:
- Changed Models: Modified nodes and their downstream + 1st degree of their parents.
- All: Show all nodes.
- Package: Filter by dbt package names.
- Select: Select nodes by node selection.
- Exclude: Exclude nodes by node selection.
Select Nodes
Click a node to select it, or click the Select nodes button at the top-right corner to select multiple nodes for further operations. For detail, see the Multi Nodes Selections section
Row Count Diff
A row count diff can be performed on nodes selected using the select
and exclude
options:
After selecting nodes, run the row count diff by:
- Clicking the 3 dots (...) button at the top-right corner.
- Clicking Row Count Diff by Selector.
Node Details
The node details panel shows information about a node, such as node type, schema and row count changes, and allows you to perform diffs on the node using the options accessed via the Explore Change
button.
Schema Diff
Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage Diff to open the node details and view the Schema Diff.
Note
Schema Diff requires catalog.json
in both environments.
Row Count Diff
Row Count Diff shows the difference in row count between the base and current environments.
- Click the model in the Lineage DAG.
- Click the
Explore Change
button in the node details panel. - Click
Row Count Diff
.
Code Diff
Code Diff shows the model code that has changed for a particular model.
- Click the model in the Lineage DAG.
- Click the
Explore Change
button in the node details panel. - Click
Code Diff
.
Value Diff
Value Diff shows the matched count and percentage for each column in the table. It uses the primary key(s) to uniquely identify the records between the model in both environments.
The primary key is automatically inferred by the first column with the unique test. If no primary key is detected at least one column is required to be specified as the primary key.
- Added: Newly added PKs.
- Removed: Removed PKs.
- Matched: For a column, the count of matched value of common PKs.
- Matched %: For a column, the ratio of matched over common PKs.
Note
Value Diff uses the compare_column_values
from audit-helper. To use Value Diff, ensure that audit-helper
is installed in your project.
View mismatched values at the row level by clicking the show mismatched values
option on a column name:
Profile Diff
Profile Diff compares the basic statistic (e.g. count, distinct count, min, max, average) for each column in models between two environments.
- Select the model from the Lineage DAG.
- Click the
Expore Change
button. - Click
Profile Diff
.
Please refer to the dbt-profiler documentation for the definitions of profiling stats.
Note
Profile diff uses the get_profile
from dbt-profiler. To use Profile Diff, ensure that dbt-profiler is installed in your project.
Histogram Diff
Histogram Diff compares the distribution of a numeric column in an overlay histogram chart.
A Histogram Diff can be generated in two ways.
Via the Explore Change button menu:
- Select the model from the Lineage DAG.
- Click the
Explore Change
button. - Click
Histogram Diff
. - Select a column to diff.
- Click
Execute
.
Via the column options menu:
- Select the model from the Lineage DAG.
- Hover over the column in the Node Details panel.
- Click the vertical 3 dots
...
- Click
Histogram Diff
.
Top-K Diff
Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.
A Top-K Diff can be generated in two ways.
Via the Explore Change button menu:
- Select the model from the Lineage DAG.
- Click the
Explore Change
button. - Click
Top-K Diff
. - Select a column to diff.
- Click
Execute
.
Via the column options menu:
- Select the model from the Lineage DAG.
- Hover over the column in the Node Details panel.
- Click the vertical 3 dots
...
- Click
Top-K Diff
.
Multi-Node Selection
Multiple nodes can be selected in the Lineage DAG. This enables actions to be performed on multiple nodes at the same time such as Row Count Diff, or Value Diff.
Select Nodes Individually
To select multiple nodes individually, click the check box on the nodes you wish to select.
Select Parent or Child nodes
To select a node and all of its parents or children:
- Click the checkbox on the node.
- Right click the node.
- Click to select either parent or child nodes.
Perform actions on multiple nodes
After selecting the desired nodes, use the Actions menu at the top right of the screen to perform diffs or add checks.
Example - Row Count Diff
An example of selecting multiple nodes to perform a multi-node row count diff:
Example - Value Diff
An example of selecting multiple nodes to perform a multi-node Value Diff:
Screenshot
In the diff result, we can find a Copy to Clipboard button. it's a handy feature to copy the result image to clipboard and paste in your PR comment.
Note
FireFox does not support to copy image to clipboard. Recce show a modal instead. You can download the image to local or right-click on the image to copy the image.
Add to Checklist
The Recce Checklist provides a way to record the results of a data check during change exploration. The purpose of adding Checks to the Checklist is to enable you to:
- Save Checks with notes of your interpretation of the data.
- Re-run checks following further data modeling changes.
- Share Checks as part of PR or stakeholder review.
Schema and Lineage Diff
From the Lineage DAG, click the Actions dropdown menu and click Lineage Diff or Schema Diff from the Add to Checklist section. This will add:
- Lineage Diff: The current Lineage view, dependent on your node selection options.
- Schema Diff: A diff of all nodes if none are selected, or specific selected nodes.
Diffs performed via the Explore Change dropdown menu
For the majority of diffs, which are performed via the Explore Change dropdown menu, the Check can be added by clicking the Add to Checklist button in the results panel:
An example performing a Top-K diff and adding the results to the Checklist: