Introduction to dbt (data build tool)
dbt is the “T” (Transform) in ELT. It allows Data Engineers and Analysts to write transformations in simple SQL while dbt handles the complex work of dependency management, version control, and data testing.dbt is a development framework that combines SQL with software engineering best practices (version control, testing, CI/CD) to build data transformations.
Core Concepts
- T in ELT: dbt does not move data; it transforms data already loaded into your warehouse (Snowflake, BigQuery, etc.).
- SQL-based: If you know
SELECTstatements, you can use dbt. - Modular: It turns monolithic scripts into reusable models.
Official Resources
🔗 Learning Resources
- dbt Learn: dbt Fundamentals - The gold standard course for learning the modern data stack.
- Best Practices: The dbt Viewpoint - Essential reading on why dbt works the way it does.
- Documentation: dbt Core & Cloud Docs
💎 dbt (Data Build Tool) Practice Tasks
dbt sits on top of your Data Warehouse. It allows you to transform “Raw” data into “Clean” data using SQL and Jinja templating.
📝 Practice Tasks: Analytics Engineering
Task 1: Project Initialization & Profiles
Goal: Set up the dbt environment.
- Requirement: Run
dbt init grocery_project. - Challenge: Configure your
profiles.ymlto connect to a local SQLite database or a cloud warehouse (Snowflake/BigQuery). - Verification: Run
dbt debugto ensure the connection is successful.
Task 2: Building Modular Models (The ref function)
Goal: Create a dependency between two models.
- Requirement: 1. Create a “Staging” model
stg_sales.sqlthat selects all data from your raw sales table. 2. Create a “Fact” modelfct_daily_summary.sqlthat uses the{{ ref('stg_sales') }}function to calculate daily totals. - Discussion: Why is using
ref()better than hardcoding the table name likeFROM my_db.my_schema.stg_sales?
Task 3: Data Documentation & YAML
Goal: Document your data for the business team.
- Requirement: Create a
schema.ymlfile in yourmodels/folder. - Challenge: Add a description for the
transaction_idandtotal_revenuecolumns. - Command: Run
dbt docs generatefollowed bydbt docs serve. Can you see your documentation in the browser?
Task 4: Automated Testing (Data Quality)
Goal: Catch bad data before it hits the dashboard.
- Requirement: In your
schema.yml, add the following tests to thetransaction_idcolumn:uniquenot_null
- Challenge: Add an
accepted_valuestest to thecategorycolumn to ensure only ‘Produce’, ‘Dairy’, and ‘Bakery’ are allowed. - Command: Run
dbt testand check if your data passes.