Lineage
The Lineage Diff is the main interface to Recce and allows you to quickly see the potential area of impact from your dbt data modeling changes.
Lineage Diff
It's from the Lineage Diff that you will determine which models to investigate further; and also perform the various data validation checks that will serve as proof-of-correctness of your work.
Node Summary
Models are color-coded to indicate their status:
Added
models are green.Removed
models are red.Modified
models are orange.
The two icons at the bottom right of each node indicate if a row count
or schema
change has been detected. Grayed out icons indicate no change.
Note: A row count changed icon is only shown if there is row count diff executed on this node.
Click a model to open the node details panel and perform other data validation checks.
Filter Nodes
In the top control bar, you can change the rule to filter the nodes:
- Mode:
- Changed Models: Modified nodes and their downstream + 1st degree of their parents.
- All: Show all nodes.
- Package: Filter by dbt package names.
- Select: Select nodes by node selection.
- Exclude: Exclude nodes by node selection.
Select Nodes
Click a node to select it, or click the Select nodes button at the top-right corner to select multiple nodes for further operations. For detail, see the Multi Nodes Selections section
Row Count Diff
A row count diff can be performed on nodes selected using the select
and exclude
options:
After selecting nodes, run the row count diff by:
- Clicking the 3 dots (...) button at the top-right corner.
- Clicking Row Count Diff by Selector.
Node Details
The node details panel shows information about a node, such as node type, schema and row count changes, and allows you to perform diffs on the node using the options accessed via the Explore Change
button.
Schema Diff
Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage Diff to open the node details and view the Schema Diff.
Note
Schema Diff requires catalog.json
in both environments.
Row Count Diff
Row Count Diff shows the difference in row count between the base and current environments.
- Click the model in the Lineage DAG.
- Click the
Explore Change
button in the node details panel. - Click
Row Count Diff
.
Code Diff
Code Diff shows the model code that has changed for a particular model.
- Click the model in the Lineage DAG.
- Click the
Explore Change
button in the node details panel. - Click
Code Diff
.
Value Diff
Value Diff shows the matched count and percentage for each column in the table. It uses the primary key(s) to uniquely identify the records between the model in both environments.
The primary key is automatically inferred by the first column with the unique test. If no primary key is detected at least one column is required to be specified as the primary key.
- Added: Newly added PKs.
- Removed: Removed PKs.
- Matched: For a column, the count of matched value of common PKs.
- Matched %: For a column, the ratio of matched over common PKs.
Note
Value Diff uses the compare_column_values
from audit-helper. To use Value Diff, ensure that audit-helper
is installed in your project.
View mismatched values at the row level by clicking the show mismatched values
option on a column name:
Profile Diff
Profile Diff compares the basic statistic (e.g. count, distinct count, min, max, average) for each column in models between two environments.
- Select the model from the Lineage DAG.
- Click the
Expore Change
button. - Click
Profile Diff
.
Please refer to the dbt-profiler documentation for the definitions of profiling stats.
Note
Profile diff uses the get_profile
from dbt-profiler. To use Profile Diff, ensure that dbt-profiler is installed in your project.
Histogram Diff
Histogram Diff compares the distribution of a numeric column in an overlay histogram chart.
A Histogram Diff can be generated in two ways.
Via the Explore Change button menu:
- Select the model from the Lineage DAG.
- Click the
Explore Change
button. - Click
Histogram Diff
. - Select a column to diff.
- Click
Execute
.
Via the column options menu:
- Select the model from the Lineage DAG.
- Hover over the column in the Node Details panel.
- Click the vertical 3 dots
...
- Click
Histogram Diff
.
Top-K Diff
Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.
A Top-K Diff can be generated in two ways.
Via the Explore Change button menu:
- Select the model from the Lineage DAG.
- Click the
Explore Change
button. - Click
Top-K Diff
. - Select a column to diff.
- Click
Execute
.
Via the column options menu:
- Select the model from the Lineage DAG.
- Hover over the column in the Node Details panel.
- Click the vertical 3 dots
...
- Click
Top-K Diff
.
Multi Node Selection
Multiple nodes can be selected to perform diffs on all nodes as part of a single check.
Select Nodes
- Click the Select nodes button.
- Select one or more nodes.
- Alternatively, right-click on a node and click Select parent nodes or Select child nodes.
- Click the action to perform in the control bar.
Row Count Diff
An example of a multi-node Row Count Diff showing the row count difference between the base and current environments.
Value Diff
An example of a multi-node Value Diff showing the matched percentage difference between the base and current environments.
Screenshot
In the diff result, we can find a Copy to Clipboard button. it's a handy feature to copy the result image to clipboard and paste in your PR comment.
Note
FireFox does not support to copy image to clipboard. Recce show a modal instead. You can download the image to local or right-click on the image to copy the image.
Add to Checklist
In the lineage page, we can run different type of check. However, for these reason we would like to add to checklist
- Keep the check and I can rerun this after my code change
- Add my result and interpretation for review purpose
Lineage Diff
Lineage diff by selector
- Select nodes by
Select
andExclude
on the top control. - Click ... at the top-right corner
- Click the Lineage diff
Lineage diff by multi nodes selection
- Click Select nodes button at the top-right corner
- Select nodes
- Click the Add lineage diff check button
Schema Diff
Schema diff by node selector
- Select nodes by
Select
andExclude
on the top control. - Click ... at the top-right corner
- Click the Schema diff button
Schema diff by multi nodes selection
- Click Select nodes button at the top-right corner
- Select nodes
- Click the Add schema check button
Schema diff for single node
- Select a node, then the node detail would show.
- Click Add check button on the node detail pane.
- Click Schema check
Row Count Diff
Row count diff by node selector
- Select nodes by
Select
andExclude
on the top control. - Click ... at the top-right corner
- Click the Row Count Diff by Selctor, then it will run the row count diff
- Click the Add to checklist in the result page.
Row count diff by multi nodes selection
- Click Select nodes button
- Select nodes
- Click Row count diff, then it will run the row count diff
- Select a node, then the run result would show.
- Click Add to checklist
Other Diffs
- Execute the diff
- Click Add to checklist