See what our users are saying about Recce

Case Study

Rio de Janeiro Department of Health

Prefeitura do Rio de Janeiro uses Recce to ensure data correctness of the health records for 7 million people

Recce enabled a formalized and streamlined approach to developing with dbt, providing complete visibility into data impact from modeling changes and eliminating the risk of merging bad data into production.

The Stats

Pull Requests Now Merge in 1 Hour

Decreased dbt pull-request review-time from 1 day to an hour or less

Complete Visibility into Data Impact

PR authors can now see and understand the data impact from dbt modeling changes

Enhanced Stakeholder Communication

The single-source-of-truth for discussion with stakeholders and PR reviewers

The challenges of maintaining the health records for 7 million people

The data team at the Prefeitura do Rio de Janeiro (City of Rio de Janeiro) health department uses Recce, and the workflows that Recce encourages, to substantially improve the pull request process on their large-scale electronic health record (EHR) integration data project.

The health department's data team, consisting of six data scientists and engineers, was tasked with integrating EHR data from various sources covering around 7 million users and 100 million events.

In my job I have to understand data change, and Recce was a game changer. It made our PR review much quicker - From a day to less than an hour to merge.

Thiago Trabach

Head of Data Science & Engineering
Prefeitura do Rio de Janeiro

A better dbt pull request review process was needed

Daily work on the project involved defining and maintaining a growing dbt project with increasing complexity. This resulted in frequent PRs requiring thorough review. Thiago, the head of data and engineering, was responsible for reviewing the majority of PRs and needed a solution to better optimize the PR review process to ensure data quality and consistency.

Before implementing Recce, the PR review process was unstructured and time-consuming. The manual approach sometimes resulted in errors making their way into production.

The solution to data integrity

Thiago and the team adopted Recce, which helped facilitate a more structured PR review process. This allowed the data team to demonstrate the impact of code changes on data, and better equipped Thiago to review and sign-off the changes.

The new PR review process that Recce enabled the data team to:

  • Understand data impact
    By using Recce to check their work during development, the data team ensures that code changes are having the expected impact.
  • Provide proof-of-correctness
    By using Recce to generate visualizations, lineage diff diagrams, data profiles, and more, the team has concrete evidence of the data changes resulting from their code modifications.
  • PR review validation
    Thiago reviews the evidence presented in the PR, focusing on the key metrics and visualizations generated by Recce, to ensure that the changes align with the developer's stated intentions and do not introduce unexpected issues.
Lineage Diff helps with model refactoring work by showing the removal of a model and a new source for modified model

Lineage Diff helps with model refactoring work by showing the removal of a model and a new source for modified model

The result after implementing Recce in the data team

After implementing Recce into their PR review process, the Health Department of Rio de Janeiro saw significant improvements in the following areas:

  • Reduced PR review time
    The time required to review a pull request decreased dramatically from a day to under an hour.
  • Increased data accuracy
    he structured process and visual evidence provided by Recce helped to identify and rectify data inconsistencies before they reached production, minimizing the risk of errors.
  • Improved communication
    Recce checks provided a single source of truth on data impact that facilitated discussion around the changes.

Examples of where Recce was used

The health department data teams used Recce in refactoring work which involved changing data sources and other jobs including addressing existing data quality issues and modifying data models.

Recce specifically helped to:

  • Uncover situations in which duplicate records existed.
  • Demonstrate the improved accuracy of patient records by removing duplicate rows.
  • Comparing data profiling stats after SQL logic changes, or updates to source tables, to understand how these changes may impact downstream data.
  • Proving that records previously containing null values have been fixed.
  • Used Lineage Diff to ensure that scope of impact was as expected.
Actual screenshots of query diff and row count diff results from the Prefeitura do Rio de Janeiro data project

These represent just a handful of the real world situations in which Recce helped to validate the impact of dbt data model code changes on data; and enabled the team to merge PRs with the confidence there would not be unexpected data impact.

Conclusion

The intuitive nature of Recce, and ease-of-implementation into existing data workflows, made it a perfect fit for the team at Prefeitura do Rio de Janeiro. Recce not only provided the tools to perform data validation, but also encouraged a formalized PR review process that spans from development-time data modeling, right through to the final PR review.

The suite of tools offered by Recce provides unparalleled insight into data impact, making it an essential part of any data or analytics engineer’s toolkit. Join other teams like the Prefeitura do Rio de Janeiro and stop merging to prod in the dark - merge with confidence after performing a Data Recce.

Bonus

The team created a script to help analysts easily generate the necessary dbt artifacts and start a Recce Instance with one command. Check out the script, recce.sh (or recce.ps1 for Windows users), in their data project repo.

Book a demo

Find out more about how Recce can improve your data PR review process

Book us with Cal.com