Recce enabled a formalized and streamlined approach to developing with dbt, providing complete visibility into data impact from modeling changes and eliminating the risk of merging bad data into production.
Decreased dbt pull-request review-time from 1 day to an hour or less
PR authors can now see and understand the data impact from dbt modeling changes
The single-source-of-truth for discussion with stakeholders and PR reviewers
The data team at the Prefeitura do Rio de Janeiro (City of Rio de Janeiro) health department uses Recce, and the workflows that Recce encourages, to substantially improve the pull request process on their large-scale electronic health record (EHR) integration data project.
The health department's data team, consisting of six data scientists and engineers, was tasked with integrating EHR data from various sources covering around 7 million users and 100 million events.
In my job I have to understand data change, and Recce was a game changer. It made our PR review much quicker - From a day to less than an hour to merge.
Daily work on the project involved defining and maintaining a growing dbt project with increasing complexity. This resulted in frequent PRs requiring thorough review. Thiago, the head of data and engineering, was responsible for reviewing the majority of PRs and needed a solution to better optimize the PR review process to ensure data quality and consistency.
Before implementing Recce, the PR review process was unstructured and time-consuming. The manual approach sometimes resulted in errors making their way into production.
Thiago and the team adopted Recce, which helped facilitate a more structured PR review process. This allowed the data team to demonstrate the impact of code changes on data, and better equipped Thiago to review and sign-off the changes.
The new PR review process that Recce enabled the data team to:
Lineage Diff helps with model refactoring work by showing the removal of a model and a new source for modified model
After implementing Recce into their PR review process, the Health Department of Rio de Janeiro saw significant improvements in the following areas:
The health department data teams used Recce in refactoring work which involved changing data sources and other jobs including addressing existing data quality issues and modifying data models.
Recce specifically helped to:
These represent just a handful of the real world situations in which Recce helped to validate the impact of dbt data model code changes on data; and enabled the team to merge PRs with the confidence there would not be unexpected data impact.
The intuitive nature of Recce, and ease-of-implementation into existing data workflows, made it a perfect fit for the team at Prefeitura do Rio de Janeiro. Recce not only provided the tools to perform data validation, but also encouraged a formalized PR review process that spans from development-time data modeling, right through to the final PR review.
The suite of tools offered by Recce provides unparalleled insight into data impact, making it an essential part of any data or analytics engineer’s toolkit. Join other teams like the Prefeitura do Rio de Janeiro and stop merging to prod in the dark - merge with confidence after performing a Data Recce.
The team created a script to help analysts easily generate the necessary dbt artifacts and start a Recce Instance with one command. Check out the script, recce.sh (or recce.ps1 for Windows users), in their data project repo.
Find out more about how Recce can improve your data PR review process