What is Recce?
Recce is a data validation toolkit designed to enhance the pull request (PR) review process for dbt projects. Recce provides enhanced visibility into the data impact from dbt modeling changes by comparing the data in dev and prod environments. Using Recce for data impact assessment before merging a PR ensures that production data remains stable and accurate.
Key Features
Manual and Automated Data Checks
Recce checks help you to assess data impact and explore data change both manually and automatically.
- Manual checks - Create a Recce Checklist of data checks that help to validate your data modeling work during development, including data profile comparisons, structural comparisons, and row-level data checks.
- Automated checks - Integrate Recce Checks into your CI process and post a data impact summary automatically to your PR thread when opening a PR.
Collaboration and Replication
Share Recce checks with your team for stakeholder and PR review. Checks results can be either shared individually, or your full Recce environment can be exported and replicated with one command.
Why Recce
dbt has brought software engineering best practices to data projects, but “bad merges” still happen, allowing erroneous data and silent errors to make their way into prod data.
Understand data impact
Recce provides data and analytics engineers with a toolkit to explore data impact caused by dbt data modeling changes. The varying levels of Recce checks enable holistic or fine grained impact assessment so you can drill down to find the root cause of data change.
Improved confidence merging
The improved visibility into data impact gives PR reviewers the confidence to sign-off PRs knowing that prod data will not change unexpectedly.
How Recce Works
Recce compares dbt environments using the dbt artifacts from both dev and prod environments.
-
Generate artifacts for the prod environment:
-
Switch to your dev branch and generate dev artifacts:
-
Start your Recce Instance:
Open your the Recce web UI to start exploring and understanding data impact, and validating your work.
What you get
Interactive impact assessment environment
recce server
launches a web UI with an interactive impact assessment environment. Use the tools in Recce to explore the impact to your data models from your branch changes.
Focused data impact exploration
The main interface to Recce is the lineage DAG, which shows modified nodes and potentially impacted downstream nodes. You can quickly see if critical nodes are within the impact radius and focus your data validation efforts.
Getting Started
Try the 5-minute tutorial that uses dbt’s Jaffle Shop project, or take the online demo for a test run, which includes an actual PR and related Recce Instance.
What does Recce mean?
Recce (/ˈrɛki/), pronounced 'reh-kee', is short for 'reconnaissance'. We chose this name as it's the perfect fit for a tool you'll use to perform a 'data reconnaissance' to discover and assess the impact of data modeling changes. Add a Data Recce to your pull request workflow and stop pushing breaking changes to production!