Skip to content

Demo Tutorial

Estimated Time: 20 minutes

Note

Recce Cloud is currently in private alpha and scheduled for general availability soon. Sign up to the Recce newsletter to be notified, or email product@datarecce.io to join our design partnership program for early access.

The following guide uses the official Jaffle Shop DuckDB project from dbt-labs, and provides everything you need to get started with Recce Cloud. By the end of the guide you'll be able to create and sync Recce checks with a GitHub PR via Recce Cloud.

To see what you'll get, check out the first section from the following Loom:

Clone Jaffle Shop to your own private repository

  1. Create a private repository in your GitHub account.
  2. Clone the Jaffle Shop DuckDB dbt data project:
    git clone git@github.com:dbt-labs/jaffle_shop_duckdb.git
    cd jaffle_shop_duckdb
    
  3. Change the remote url to the repository you just created:
    git remote set-url origin git@github.com:<owner>/<repo>.git
    
  4. Push to your newly created repository:
    git push
    

Authorize Recce Cloud to access the repository

Recce Cloud needs access to your data project's repository in order to sync your checks status to the pull request thread.

  1. Visit Recce Cloud. If it is your first time logging in, click the Continue with Github button to authorize the Recce Cloud integration to access your GitHub account. alt text alt text
  2. Click the Install button to install the Recce Cloud GitHub app to your personal or organization account. alt text
  3. On the app installation page, authorize Recce Cloud to access the repository you created in the previous section. alt text
  4. Authorized repositories will then be shown in your Recce Cloud account. alt text

Configure the Jaffle Shop DuckDB data project

Set up the Jaffle Shop project and install Recce.

  1. Create a new Python virtual env:
    python -m venv venv
    source venv/bin/activate
    
  2. Install the requirements and Recce:
    pip install -r requirements.txt
    pip install recce
    
  3. Add a production environment to the project by editing ./profiles.yml and adding the following target:
    jaffle_shop:
      target: dev
      outputs:
        dev:
          type: duckdb
          path: 'jaffle_shop.duckdb'
          threads: 24
    +   prod:
    +     type: duckdb
    +     path: 'jaffle_shop.duckdb'
    +     schema: prod
    +     threads: 24
    
  4. Add the following packages required by Recce for some features (highly recommended). Create a ./packages.yml file in the root of your project with the following packages:
    packages:
    - package: dbt-labs/audit_helper
      version: 0.12.0
    - package: data-mie/dbt_profiler
      version: 0.8.2
    
    Install the packages:
    dbt deps
    

Prepare the base environment

Recce requires to two environments to compare. The base represents your point of reference (the known-good base), and target represents your PR/development branch.

  1. Prepare production (base) environment. (Note the custom --target-path):
    dbt seed --target prod
    dbt run --target prod
    dbt docs generate --target prod --target-path ./target-base
    
  2. Add the target-base/ folder to the .gitignore file:
     target/
    +target-base/
     dbt_packages/
     dbt_modules/
     logs/
    
  3. Remove the existing GitHub action workflows:
    rm -rf .github/
    
  4. Push the changes to remote:
    git add .
    git commit -m 'Configure project and prep for Recce'
    git push 
    

Important

By default, Recce expects the dbt artifacts for the base environment to be located in a folder named target-base.

The base environment preparation is now complete. The data in the prod schema, and artifacts in the target-base folder, represent stable (production) data.

As a PR author, you'll be working on data models, making changes to the project, and validating your work for correctness.

Prepare the review state for the PR

In this section, you'll make a new branch, update a data model, and create a pull request.

  1. Checkout a branch:

    git checkout -b feature/recce-getting-started
    

  2. Edit the staging model located in ./models/staging/stg_payments.sql as follows:

    ...
    
    renamed as (
             payment_method,
    
    -        -- `amount` is currently stored in cents, so we convert it to dollars
    -        amount / 100 as amount
    +        amount
    
             from source
    )
    

  3. Run dbt on the development environment (the default target):

    dbt seed
    dbt run
    dbt docs generate
    

  4. Commit the change:

    git add models/staging/stg_payments.sql
    git commit -m 'Update the model'
    git push -u origin feature/recce-getting-started 
    

  5. Create a pull request for this branch in your GitHub repository.

Important

Don't forget to create a branch for the commit above, before continuing with this tutorial.

Launch a Recce Instance to validate your change

In this section, you will launch a Recce Instance, create validation checks, and sync those checks with Recce Cloud so they can be reviewed by your PR reviewer.

Prepare a GitHub Token and Recce State password

To access the repository, your local Recce Instance will require a GitHub Token (Classic).

  1. Prepare a GitHub Token (Classic) in your account. Ensure you provide repo permission for the new token. Create a GitHub Token (Classic)
  2. Ensure you have configured these environment variables.
    export GITHUB_TOKEN=<github-token>
    export RECCE_STATE_PASSWORD=mypassword
    

Run Recce in Cloud Mode

Run Recce instance in the cloud mode

recce server --cloud

Open the link to the Recce Instance in your browser. By default it should be http://0.0.0.0:8000

Create a Recce Check

Switch to the Query tab and paste the following query:

select * from {{ ref("orders") }} order by 1
Enter the primary key as order_id and click the Run Diff button. Recce Query Diff 1. Click the Add to Checklist button to add the query result to your Checklist 1. On the Checklist page you'll find that there are three checks. The Row count diff and Schema diff are default Preset Checks, and the Query diff is your newly added check. Leave the checks as unapproved. 1. Go back to the command line and terminate the Recce instance. Your Recce State file, containing your checklist and other artifacts will be encrypted and uploaded to Recce Cloud. 1. Go to the PR page in your GitHub repository and scroll to the bottom. Notice that Recce Cloud shows that check are not approved: GitHub PR with unapproved Recce Checks

Note

Recce checks sync in realtime. However, due to the overhead of encrypting, compressing, and tranferring the State file, the sync may be slightly delayed. Ensure that you always terminate your Recce Instance on the CLI, and wait for the State to be synced. This will ensure your checks are saved to Recce Cloud.

Review the PR

As a PR author, your job is to review and approve the Checks created by the PR author. Once approved, the PR can be merged.

Note

As this tutorial uses DuckDB, which is a file-based database, the reviewer needs to have the same DuckDB file to continue the reviewers journey.

Run Recce is Review mode

The PR reviewer should prepare their own GitHub token, but ensure to use the same password as the PR author. (The password is used to unencrypt the State file and so must be the same.)

  1. Checkout the PR branch
    git checkout feature/recce-getting-started
    
  2. Configured the required environment variables.
    export GITHUB_TOKEN=<github-token>
    export RECCE_STATE_PASSWORD=mypassword
    
  3. Run Recce in --cloud and --review mode.
    recce server --cloud --review
    

Approve Recce Checks

When Recce loads, click the Checklist tab to review the Checks that have been prepared by the PR author.

Approve all the checks if everything looks good to you

Recce Checklist showing approved Checks

The approval status of the check is automatically synced to Recce Cloud.

Merge the PR

Back on the GitHub PR page, you'll notice that the Recce Cloud check status has automatically been updated showing that "All checks are approved".

Recce Cloud - All Checks are Approved

In a real-world situation you'd now be able to merge the PR with the confidence that the PR author had checked their work, and the reviewer both understands and has signed-off on any changes.