Skip to content

5 Minute Tutorial

Note

Recce Cloud is currently in private alpha and scheduled for general availability later this year. Sign up to the Recce newsletter to be notified, or email product@datarecce.io to join our design partnership program for early access.

Jaffle Shop is an example project officially provided by dbt-labs. This document uses jaffle_shop_duckdb to enable you to start using recce cloud from scratch within five minutes.

Clone Jaffle Shop to your Private Repositroy

  1. Create a private repository in your github account.
  2. Clone the Jaffle Shop dbt data project
    git clone git@github.com:dbt-labs/jaffle_shop_duckdb.git
    cd jaffle_shop_duckdb
    
  3. Change the remote url. Change the remote url to your repository.
    git remote set-url origin git@github.com:<owner>/<repo>.git
    
  4. Push to your new created repository.
    git push
    

Authorize the repository to the Recce Cloud.

  1. Go to the recce cloud. If it is your first time to login, please click the Continue with Github and authorize your github account to the Recce Cloud GitHub App. alt text alt text
  2. Click the Install button to install Recce Cloud github app to your personal or organization account. alt text
  3. In the app installation page in GitHub, authorize the new created repository to the app. alt text
  4. Then it will show up all the authorized repositories. alt text

Prepare the base environment

  1. Prepare virtual env
    python -m venv venv
    source venv/bin/activate
    
  2. Installation
    pip install -r requirements.txt
    pip install recce
    
  3. Provide additional environment to compare. Edit ./profiles.yml to add one more target.
    jaffle_shop:
      target: dev
      outputs:
        dev:
          type: duckdb
          path: 'jaffle_shop.duckdb'
          threads: 24
    +   prod:
    +     type: duckdb
    +     path: 'jaffle_shop.duckdb'
    +     schema: prod
    +     threads: 24
    
  4. Add dbt packages for recce. Add ./packages.yml
    packages:
    - package: dbt-labs/audit_helper
      version: 0.12.0
    - package: data-mie/dbt_profiler
      version: 0.8.2
    
    and run
    dbt deps
    
  5. Prepare production environment
    dbt seed --target prod
    dbt run --target prod
    dbt docs generate --target prod --target-path ./target-base
    
  6. Add target-base/ folder to .gitignore
     target/
    +target-base/
     dbt_packages/
     dbt_modules/
     logs/
    
  7. Remove the existing github action workflow.
    rm -rf .github/
    
  8. Push to remote
    git add .
    git commit -m 'Add recce changes'
    git push
    

Prepare the review state for the PR

As a PR author, you can prepare the recce review state and persist it in the recce cloud.

  1. Checkout a branch
git checkout -b feature/recce-getting-started
  1. Prepare development environment. First, edit an existing model ./models/staging/stg_payments.sql.
...

renamed as (
         payment_method,

-        -- `amount` is currently stored in cents, so we convert it to dollars
-        amount / 100 as amount
+        amount

         from source
)

run on development environment.

dbt seed
dbt run
dbt docs generate
  1. Commit the change
git add models/staging/stg_payments.sql
git commit -m 'Update the model'
git push
  1. Create a pull request for this branch in your github repository.
  2. Prepare a github token in your account. You have to provide the repo permission. alt text
  3. Ensure you have configured these environment variables.
    export GITHUB_TOKEN=<github-token>
    export RECCE_STATE_PASSWORD=mypassword
    
  4. Run recce server in the cloud mode
    recce server --cloud
    
    Open the link http://0.0.0.0:8000
  5. Switch to the Query tab, add this query
    select * from {{ ref("orders") }} order by 1
    
    Add the primary key order_id and click the Run Diff button alt text
  6. Click the + button to add the query result to checklist
  7. You can find that there are three checks in the Checks page
  8. Terminate the server. It would store the state to the recce cloud.
  9. In the GitHub PR page, we can find a failed check for this PR. This is because not all checks are approved. alt text

Review the PR

As a PR author, you can review the PR by using the state stored in the recce cloud. If the checks are all good, you can approve them.

  1. Checkout the PR branch
    git checkout feature/recce-getting-started
    
  2. Ensure you have configured these environment variables.
    export GITHUB_TOKEN=<github-token>
    export RECCE_STATE_PASSWORD=mypassword
    
  3. Run the recce server
    recce server --cloud --review
    
  4. You can see the lineage diff and the checklist prepared by the PR author.
  5. Approve all the checks if everything looks good to you alt text
  6. Go back to the GitHub PR page, you can find that the recce check is marked as passed. alt text

Note

In this tutorial, we use duckdb as the warehouse, which is a file-based warehouse. The reviewer needs to have the same duckdb file to run the query.