Skip to content

2024

Explore data impact and focus on tracking data validations with Recce's new interface

There’s nothing worse than being distracted while you’re in the middle of working on a complex task or debugging an issue. All the things that you were juggling in your mind come crashing down, and you have to pick up the pieces and start again.

Recce's updated interface lets you stay on track while assessing and exploring data impact in your dbt project when making dbt data model changes, and performing dbt PR review.

Track and record data impact assessment results

Staying in context is especially important when you’re running data validations to verify your dbt data models, exploring data change and impact, or finding the root cause of a data incident. You need to:

The Ultimate PR Comment Template Boilerplate for dbt data projects

If you’re looking to level-up your dbt game and improve the PR review process for your team, then using a PR comment template is essential.

A PR template is a markdown-formatted boilerplate comment that you copy and paste into the comment box when opening a PR. You then fill out the relevant sections and submit the comment with your PR.

Example of a PR comment with comprehensive data validation checks
Example of a PR comment with comprehensive data validation checks

The benefits of using a PR comment template

The sections in the template help by providing a systematic approach to defining and checking your work. Having a template also helps to avoid ambiguous or superficial comments which make PRs difficult to review. The benefits of using a PR comment template for modern dbt dataops are:

Meet Recce at Coalesce 2024 and The Data Renegade Happy Hour

The Recce team will be joining Coalesce 2024 in Las Vegas! Meet our founder, CL Kao, and product manager, Karen Hsieh, who has also been hosting the Taipei dbt meetups. As a company focused on helping data teams prevent bad merges and improve data quality, we believe Coalesce is the perfect venue to connect with fellow data professionals, share insights, and gain fresh perspectives.

Firesidechat banner
We are attending Coalesce 2024

At Recce, our mission is to transform the data PR review process, ensuring that data pipelines not only run smoothly but also deliver accurate, validated results. We believe that data should be correct, collaborative, and continuously improved. Coalesce 2024 offers an ideal platform for these crucial conversations, gathering experts across the field to discuss the future of data management. Whether it’s gaining new insights into best practices or forging valuable partnerships, Coalesce is where we aim to make an impact.

The Guide to Supporting Self-Serve Data and Analytics with Comprehensive PR Review

dbt, the platform that popularized ELT, has revolutionized the way data teams create and maintain data pipelines. The key is in the ‘T’, of ELT. Rather than transforming data before it hits the data warehouse, as in traditional ETL, dbt flips this and promotes loading raw data into your data warehouse and transforming it there, thus ELT.

This, along with bringing analytics inside the data project, makes dbt an interesting solution for data teams looking to maintain a single-source-of-truth(SSoT) for their data, ensuring data quality and integrity.

Move Fast and DON'T Break Prod
Move Fast and DON'T Break Prod

From DevOps to DataOps: A Fireside Chat on Practical Strategies for Effective Data Productivity

Top priorities for data-driven organizations are data productivity, cost reduction, and error prevention. The four strategies to improve DataOps are:

  1. start with small, manageable improvements,
  2. follow a clear blueprint,
  3. conduct regular data reviews, and
  4. gradually introduce best practices across the team.

In a recent fireside chat, CL Kao, founder of Recce, and Noel Gomez, co-founder of Datacoves, shared their combined experience of over two decades in the data and software industry. They discussed practical strategies to tackle these challenges, the evolution from DevOps to DataOps, and the need for companies to focus on data quality to avoid costly mistakes.

Firesidechat banner
Data Productivity - Beyonig DevOps & dbt

Identify and Automate Data Checks on Critical dbt Models

Do you know which are the critical models in your data project?

I’m sure the answer is yes. Even if you don’t rank models, you can definitely point to which models you should tread carefully around.

Do you check these critical models for data impact with every pull request?

Maybe some, but it’s probably on a more ad-hoc basis. If they really are critical models, you need to be aware of unintended impact. The last thing you want to do is mistakenly change historical metrics, or lose data.

Every dbt project has critical models
Impacted Lineage DAG from Recce showing modified and impacted models on the California Integrated Travel Project dbt project

Identifying critical models

Knowing the critical models in your project comes from your domain knowledge. You know these models have:

Use Histogram Overlay and Top-K Charts to Understand Data Change in dbt

Data profiling stats are a really efficient way to get an understanding of the distribution of data in a dbt model. You can immediately see skewed data and spot data outliers, something which is difficult to do when checking data at the row level. Here's how Recce can help you make the most of these high-level data stats:

Visualize data change with histogram and top-k charts

Profiling stats become even more useful when applied to data change validation. Let’s say you’ve updated a data model in dbt and changed the calculation logic for a column — how can you get an overview of how the data was changed or impacted? This is where checking the top-k values, or the histogram, of before-and-after you made the changes, comes in handy — But there’s one major issue...

The best way to visualize data change in a histogram chart
The best way to visualize data change in a histogram chart

Something’s not right

If you generate a histogram graph from prod data, then do the same for your dev branch, you’ve got two distinct graphs. The axes don’t match, and it’s difficult to compare:

Hands-On Data Impact Analysis for dbt Data Projects with Recce

dbt data projects aren’t getting any smaller and, with the increasing complexity of DAGs, properly validating your data modeling changes has become a difficult task. The adoption of best practices such as data project pull request templates, and other ‘pull request guard rails’ has increased merge times and prolonged the QA process for pull requests.

Validate data modeling changes in dbt projects by comparing two environments with Recce
Validate data modeling changes in dbt projects by comparing two environments with Recce

The difficulty comes from your responsibility to check not only the model SQL code, but also the data, which is a product of your code. Even when code looks right, silent errors and hard to notice bugs can make their way into the data. A proper pull request review is not complete with data validation.

Next-Level Data Validation Toolkit for dbt Data Projects — Introducing Recce

Build the ultimate PR comment to validate your data modeling changes
Recce: Data Validation Toolkit for dbt

Validating data modeling changes and reviewing pull requests for dbt projects can be a challenging task. The difficulty of performing a proper ‘code review’ for data projects, due to both the code and data needing review, means the data validation stage is often omitted, poorly implemented, or drastically slows down time-to-merge for your time sensitive data updates.

How can you maintain data best practices, but speed up the validation and review process?