Skip to content

The Recce Blog

Here you can find articles about dbt best practices and how to get the most out of Recce. For more articles, don't forget to also check out our Medium publication.


Meet Recce at Data Council and Data Reboot, Data Renegades Happy Hour

Data Council is one of our favorite conferences: thousands of Data & AI practitioners, hundreds of cutting-edge talks, and endless opportunities to connect and be inspired. If you're in the data space, it's the event you don’t want to miss.

At Recce, we love talking to people who’ve encountered the challenges we’re passionate about. We believe data development is fundamentally different from software development. Yes, modern data teams borrow practices from software — version control, CI/CD, testing. But data comes with ambiguity. A 0.5% change in daily active customers might not mean much without business context. Is it correct? Is it broken? It depends.

Recce Is Now SOC 2 Type 1 Compliant!

At Recce, we take the security and integrity of your data seriously. Today, we're excited to announce that Recce has successfully achieved SOC 2 Type 1 compliance, marking a significant milestone in our commitment to protecting your data and maintaining the highest standards of security.

Find details in our Trust Center.

Recce has achieved SOC 2 Type 1 compliance
Recce has achieved SOC 2 Type 1 compliance!

What Does SOC 2 Type 1 Compliance Mean?

Developed by the American Institute of Certified Public Accountants (AICPA), SOC 2 (System and Organization Controls) is an industry-leading standard designed to ensure companies manage customer data securely. A SOC 2 Type 1 audit evaluates and confirms that an organization has robust security controls, policies, and practices in place at a specific point in time.

Recce: Your data change management toolkit

Whether you’re the author of a pull request or the one reviewing it, you’ve got a tough job: figuring out what changed, verifying that the PR does what it’s supposed to, and making sure nothing breaks in production. In large or business-critical dbt projects, this can be a slow, frustrating process. That’s why we built Recce - an open-source toolkit that’s here to make your data modeling validation and pull request (PR) reviews a breeze.

Build the ultimate PR comment to validate your dbt data modeling changes
Build the ultimate PR comment to validate your dbt data modeling changes

What is Recce?

Recce (pronounced “reh-kee”, short for “reconnaissance”) is a suite of change management tools designed to help you compare dbt environments, assess data impacts, and streamline your PR reviews. Recce gives you visibility into the effects of your data modeling changes before they hit production. With Recce, you can take two dbt environments, such as dev and prod, and compare them using the suite of diff tools.

Explore data impact and focus on tracking data validations with Recce's new interface

There’s nothing worse than being distracted while you’re in the middle of working on a complex task or debugging an issue. All the things that you were juggling in your mind come crashing down, and you have to pick up the pieces and start again.

Recce's updated interface lets you stay on track while assessing and exploring data impact in your dbt project when making dbt data model changes, and performing dbt PR review.

Track and record data impact assessment results

Staying in context is especially important when you’re running data validations to verify your dbt data models, exploring data change and impact, or finding the root cause of a data incident. You need to:

The Ultimate PR Comment Template Boilerplate for dbt data projects

If you’re looking to level-up your dbt game and improve the PR review process for your team, then using a PR comment template is essential.

A PR template is a markdown-formatted boilerplate comment that you copy and paste into the comment box when opening a PR. You then fill out the relevant sections and submit the comment with your PR.

Example of a PR comment with comprehensive data validation checks
Example of a PR comment with comprehensive data validation checks

The benefits of using a PR comment template

The sections in the template help by providing a systematic approach to defining and checking your work. Having a template also helps to avoid ambiguous or superficial comments which make PRs difficult to review. The benefits of using a PR comment template for modern dbt dataops are:

Meet Recce at Coalesce 2024 and The Data Renegade Happy Hour

The Recce team will be joining Coalesce 2024 in Las Vegas! Meet our founder, CL Kao, and product manager, Karen Hsieh, who has also been hosting the Taipei dbt meetups. As a company focused on helping data teams prevent bad merges and improve data quality, we believe Coalesce is the perfect venue to connect with fellow data professionals, share insights, and gain fresh perspectives.

Firesidechat banner
We are attending Coalesce 2024

At Recce, our mission is to transform the data PR review process, ensuring that data pipelines not only run smoothly but also deliver accurate, validated results. We believe that data should be correct, collaborative, and continuously improved. Coalesce 2024 offers an ideal platform for these crucial conversations, gathering experts across the field to discuss the future of data management. Whether it’s gaining new insights into best practices or forging valuable partnerships, Coalesce is where we aim to make an impact.

The Guide to Supporting Self-Serve Data and Analytics with Comprehensive PR Review

dbt, the platform that popularized ELT, has revolutionized the way data teams create and maintain data pipelines. The key is in the ‘T’, of ELT. Rather than transforming data before it hits the data warehouse, as in traditional ETL, dbt flips this and promotes loading raw data into your data warehouse and transforming it there, thus ELT.

This, along with bringing analytics inside the data project, makes dbt an interesting solution for data teams looking to maintain a single-source-of-truth(SSoT) for their data, ensuring data quality and integrity.

Move Fast and DON'T Break Prod
Move Fast and DON'T Break Prod

From DevOps to DataOps: A Fireside Chat on Practical Strategies for Effective Data Productivity

Top priorities for data-driven organizations are data productivity, cost reduction, and error prevention. The four strategies to improve DataOps are:

  1. start with small, manageable improvements,
  2. follow a clear blueprint,
  3. conduct regular data reviews, and
  4. gradually introduce best practices across the team.

In a recent fireside chat, CL Kao, founder of Recce, and Noel Gomez, co-founder of Datacoves, shared their combined experience of over two decades in the data and software industry. They discussed practical strategies to tackle these challenges, the evolution from DevOps to DataOps, and the need for companies to focus on data quality to avoid costly mistakes.

Firesidechat banner
Data Productivity - Beyonig DevOps & dbt

Identify and Automate Data Checks on Critical dbt Models

Do you know which are the critical models in your data project?

I’m sure the answer is yes. Even if you don’t rank models, you can definitely point to which models you should tread carefully around.

Do you check these critical models for data impact with every pull request?

Maybe some, but it’s probably on a more ad-hoc basis. If they really are critical models, you need to be aware of unintended impact. The last thing you want to do is mistakenly change historical metrics, or lose data.

Every dbt project has critical models
Impacted Lineage DAG from Recce showing modified and impacted models on the California Integrated Travel Project dbt project

Identifying critical models

Knowing the critical models in your project comes from your domain knowledge. You know these models have:

Use Histogram Overlay and Top-K Charts to Understand Data Change in dbt

Data profiling stats are a really efficient way to get an understanding of the distribution of data in a dbt model. You can immediately see skewed data and spot data outliers, something which is difficult to do when checking data at the row level. Here's how Recce can help you make the most of these high-level data stats:

Visualize data change with histogram and top-k charts

Profiling stats become even more useful when applied to data change validation. Let’s say you’ve updated a data model in dbt and changed the calculation logic for a column — how can you get an overview of how the data was changed or impacted? This is where checking the top-k values, or the histogram, of before-and-after you made the changes, comes in handy — But there’s one major issue...

The best way to visualize data change in a histogram chart
The best way to visualize data change in a histogram chart

Something’s not right

If you generate a histogram graph from prod data, then do the same for your dev branch, you’ve got two distinct graphs. The axes don’t match, and it’s difficult to compare: