Remote OpenClaw
Menu
SkillsMCPPluginsGuideAgentsAdvertise
Remote OpenClaw
SkillsMCPPluginsGuideAgentsAdvertise
Skills/aradotso/data-skills/snowflake-dbt-airbnb-analytics

snowflake-dbt-airbnb-analytics

aradotso/data-skills
624 installs2 stars

Installation

npx skills add https://github.com/aradotso/data-skills --skill snowflake-dbt-airbnb-analytics

Summary

Inside Airbnb data warehouse built with Snowflake and dbt, demonstrating modern analytics engineering patterns with staging, intermediate, and mart layers.

SKILL.md

Snowflake dbt Airbnb Analytics

Skill by ara.so — Data Skills collection.

This project demonstrates a complete analytics engineering workflow using Snowflake, dbt, and Streamlit. It loads Inside Airbnb open data into Snowflake, transforms it through a layered dbt architecture (staging → intermediate → marts), validates data quality with tests, and serves insights via a Streamlit dashboard.

What This Project Does

  • Raw data ingestion: Loads CSV/GZIP files from Inside Airbnb into Snowflake internal stages
  • Layered transformations: Implements staging (clean/cast), intermediate (joins/enrichment), and mart (dimensions/facts) layers
  • Incremental modeling: Uses Snowflake merge strategy for fact tables
  • Data quality: Generic and singular dbt tests validate uniqueness, relationships, and business rules
  • Analytics dashboard: Streamlit app queries marts for neighbourhood and listing performance

Data sources: listings.csv.gz, calendar.csv.gz, reviews.csv.gz, neighbourhoods.csv from Inside Airbnb

Installation

# Clone and set up environment
git clone https://github.com/analyticsdurgesh/Snowflake_DBT_Project.git
cd Snowflake_DBT_Project
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

1. Snowflake Credentials

Create local-only credential files (ignored by git):

cp profiles.yml.example profiles.yml
cp config/local_credentials.example.json config/local_credentials.json

profiles.yml (dbt connection):

airbnb_snowflake:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: YOUR_ACCOUNT
      user: YOUR_USER
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: YOUR_ROLE
      database: AIRBNB_DB
      warehouse: COMPUTE_WH
      schema: ANALYTICS
      threads: 4
      client_session_keep_alive: False

config/local_credentials.json (Streamlit connection):

{
  "account": "YOUR_ACCOUNT",
  "user": "YOUR_USER",
  "password": "YOUR_PASSWORD",
  "role": "YOUR_ROLE",
  "warehouse": "COMPUTE_WH",
  "database": "AIRBNB_DB",
  "schema": "ANALYTICS"
}

Use environment variables in production:

export SNOWFLAKE_PASSWORD="your_password"

2. Download Inside Airbnb Data

Place raw files in data/raw/:

data/raw/listings.csv.gz
data/raw/calendar.csv.gz
data/raw/reviews.csv.gz
data/raw/neighbourhoods.csv

Recommended dataset: New York City from Inside Airbnb.

Loading Raw Data

The Python loader creates Snowflake objects and stages data:

python scripts/load_inside_airbnb_to_snowflake.py

What it does:

  1. Executes setup/snowflake_setup.sql to create database, schemas, and stage
  2. Uploads raw files to INSIDE_AIRBNB_STAGE
  3. Creates raw tables with headers from CSV files
  4. Copies staged data into RAW schema tables

Key loader code patterns:

import snowflake.connector
import json

# Load credentials
with open('config/local_credentials.json') as f:
    creds = json.load(f)

# Connect to Snowflake
conn = snowflake.connector.connect(
    account=creds['account'],
    user=creds['user'],
    password=creds['password'],
    role=creds['role'],
    warehouse=creds['warehouse']
)

# Upload to stage
conn.cursor().execute(f"PUT file://data/raw/listings.csv.gz @INSIDE_AIRBNB_STAGE")

# Copy into raw table
conn.cursor().execute("""
    COPY INTO RAW.LISTINGS
    FROM @INSIDE_AIRBNB_STAGE/listings.csv.gz
    FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"')
""")

dbt Model Architecture

Layer Structure

LayerPathPurposeExample
Sourcesmodels/staging/sources.ymlDefine raw tablesRAW.LISTINGS
Stagingmodels/staging/stg_*.sqlClean, cast, standardizestg_airbnb__listings
Intermediatemodels/intermediate/int_*.sqlJoins, enrichment, business logicint_airbnb__listing_enriched
Martsmodels/marts/Dimensions, facts, aggregatesdim_listings, fct_listing_calendar

Staging Layer Example

models/staging/stg_airbnb__listings.sql:

with source as (
    select * from {{ source('airbnb_raw', 'listings') }}
),

cleaned as (
    select
        id::bigint as listing_id,
        name::varchar as listing_name,
        host_id::bigint as host_id,
        host_name::varchar as host_name,
        neighbourhood_cleansed::varchar as neighbourhood,
        room_type::varchar as room_type,
        price::varchar as price_raw,
        minimum_nights::int as minimum_nights,
        number_of_reviews::int as number_of_reviews,
        last_review::date as last_review_date,
        reviews_per_month::float as reviews_per_month,
        availability_365::int as availability_365
    from source
)

select * from cleaned

Key patterns:

  • Use {{ source() }} for raw table references
  • Cast types explicitly with ::
  • Standardize column names (snake_case)
  • Preserve raw columns when cleaning needed downstream

Intermediate Layer Example

models/intermediate/int_airbnb__calendar_enriched.sql:

with calendar as (
    select * from {{ ref('stg_airbnb__calendar') }}
),

listings as (
    select * from {{ ref('int_airbnb__listing_enriched') }}
),

enriched as (
    select
        c.listing_id,
        c.calendar_date,
        c.available,
        c.price,
        c.adjusted_price,
        c.minimum_nights,
        c.maximum_nights,
        l.listing_name,
        l.neighbourhood,
        l.room_type,
        l.host_id,
        l.host_name,
        -- Revenue proxy: price when unavailable
        case
            when c.available = false and c.price > 0
            then c.price
            else 0
        end as estimated_revenue
    from calendar c
    left join listings l
        on c.listing_id = l.listing_id
)

select * from enriched

Key patterns:

  • Use {{ ref() }} for model dependencies
  • Join staging/intermediate models
  • Add calculated business logic (revenue proxy)
  • Keep intermediate models focused on reusable logic

Incremental Fact Table Example

models/marts/fct_listing_calendar.sql:

{{
    config(
        materialized='incremental',
        unique_key=['listing_id', 'calendar_date'],
        merge_update_columns=['available', 'price', 'estimated_revenue']
    )
}}

with calendar_enriched as (
    select * from {{ ref('int_airbnb__calendar_enriched') }}
)

select
    listing_id,
    calendar_date,
    available,
    price,
    adjusted_price,
    minimum_nights,
    maximum_nights,
    neighbourhood,
    room_type,
    host_id,
    estimated_revenue
from calendar_enriched

{% if is_incremental() %}
    where calendar_date > (select max(calendar_date) from {{ this }})
{% endif %}

Key patterns:

  • materialized='incremental' for large fact tables
  • unique_key for merge strategy (update existing, insert new)
  • merge_update_columns specifies which columns to update
  • is_incremental() filters new records only on subsequent runs
  • Use --full-refresh flag to rebuild from scratch

Aggregate Mart Example

models/marts/agg_neighbourhood_monthly_performance.sql:

with listing_monthly as (
    select * from {{ ref('agg_listing_monthly_performance') }}
)

select
    neighbourhood,
    year_month,
    count(distinct listing_id) as total_listings,
    sum(total_days) as total_days,
    sum(available_days) as total_available_days,
    sum(unavailable_days) as total_unavailable_days,
    round(avg(availability_rate), 2) as avg_availability_rate,
    round(sum(estimated_revenue), 2) as total_estimated_revenue,
    round(avg(avg_price), 2) as avg_listing_price
from listing_monthly
group by neighbourhood, year_month
order by neighbourhood, year_month

Key patterns:

  • Aggregate from lower-level marts
  • Use round() for clean reporting metrics
  • Group by dimensions for dashboards

dbt Commands

# Test connection
dbt debug --profiles-dir .

# Run all models
dbt run --profiles-dir .

# Run specific model and downstream dependencies
dbt run --select dim_listings+ --profiles-dir .

# Run incremental models with full refresh
dbt run --full-refresh --select fct_listing_calendar+ --profiles-dir .

# Run tests
dbt test --profiles-dir .

# Test specific model
dbt test --select stg_airbnb__listings --profiles-dir .

# Generate and serve documentation
dbt docs generate --profiles-dir .
dbt docs serve --profiles-dir .

Common workflows:

# New data load workflow
python scripts/load_inside_airbnb_to_snowflake.py
dbt run --full-refresh --select fct_listing_calendar+ --profiles-dir .
dbt test --profiles-dir .

# Development workflow (iterative)
dbt run --select +fct_reviews --profiles-dir .  # Run model and upstream deps
dbt test --select fct_reviews --profiles-dir .

Data Quality Tests

Generic Tests in Schema Files

models/staging/schema.yml:

version: 2

models:
  - name: stg_airbnb__listings
    columns:
      - name: listing_id
        tests:
          - unique
          - not_null
      - name: room_type
        tests:
          - accepted_values:
              values: ['Entire home/apt', 'Private room', 'Shared room', 'Hotel room']
      - name: price
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

  - name: stg_airbnb__calendar
    columns:
      - name: listing_id
        tests:
          - relationships:
              to: ref('stg_airbnb__listings')
              field: listing_id

Singular Tests

tests/no_duplicate_listing_dates.sql:

-- Test for duplicate listing-date combinations in fact table
select
    listing_id,
    calendar_date,
    count(*) as record_count
from {{ ref('fct_listing_calendar') }}
group by listing_id, calendar_date
having count(*) > 1

Key patterns:

  • Generic tests in schema.yml for standard validations
  • Singular tests in tests/ for custom business rules
  • Tests return records that FAIL the condition
  • Use dbt_utils package for advanced tests

Install dbt packages (packages.yml):

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
dbt deps --profiles-dir .

Streamlit Dashboard

dashboard/streamlit_app.py:

import streamlit as st
import snowflake.connector
import pandas as pd
import json

# Load credentials
with open('config/local_credentials.json') as f:
    creds = json.load(f)

@st.cache_resource
def get_connection():
    return snowflake.connector.connect(
        account=creds['account'],
        user=creds['user'],
        password=creds['password'],
        role=creds['role'],
        warehouse=creds['warehouse'],
        database=creds['database'],
        schema=creds['schema']
    )

def run_query(query):
    conn = get_connection()
    return pd.read_sql(query, conn)

st.title("Inside Airbnb Analytics Dashboard")

# Neighbourhood performance
st.header("Top Neighbourhoods by Estimated Revenue")
query = """
    SELECT
        neighbourhood,
        total_estimated_revenue,
        avg_availability_rate,
        total_listings
    FROM agg_neighbourhood_monthly_performance
    WHERE year_month = (SELECT MAX(year_month) FROM agg_neighbourhood_monthly_performance)
    ORDER BY total_estimated_revenue DESC
    LIMIT 10
"""
df = run_query(query)
st.dataframe(df)
st.bar_chart(df.set_index('NEIGHBOURHOOD')['TOTAL_ESTIMATED_REVENUE'])

# Room type pricing
st.header("Average Price by Room Type")
query = """
    SELECT
        room_type,
        ROUND(AVG(price), 2) as avg_price
    FROM dim_listings
    WHERE price > 0
    GROUP BY room_type
    ORDER BY avg_price DESC
"""
df = run_query(query)
st.bar_chart(df.set_index('ROOM_TYPE')['AVG_PRICE'])

Run dashboard:

streamlit run dashboard/streamlit_app.py

Key patterns:

  • Use @st.cache_resource for connection pooling
  • Query marts directly for performance
  • Filter to latest snapshot with MAX(year_month)
  • Keep credentials in separate JSON file

Common Patterns

Adding a New Staging Model

  1. Define source in models/staging/sources.yml:
sources:
  - name: airbnb_raw
    database: AIRBNB_DB
    schema: RAW
    tables:
      - name: new_table
  1. Create staging model models/staging/stg_airbnb__new_table.sql:
with source as (
    select * from {{ source('airbnb_raw', 'new_table') }}
),

cleaned as (
    select
        id::bigint as record_id,
        field::varchar as clean_field
    from source
)

select * from cleaned
  1. Add tests in models/staging/schema.yml:
models:
  - name: stg_airbnb__new_table
    columns:
      - name: record_id
        tests:
          - unique
          - not_null

Creating a Dimension Table

models/marts/dim_hosts.sql:

with listings as (
    select * from {{ ref('int_airbnb__listing_enriched') }}
),

host_agg as (
    select
        host_id,
        max(host_name) as host_name,
        count(*) as total_listings,
        round(avg(price), 2) as avg_listing_price,
        sum(number_of_reviews) as total_reviews
    from listings
    group by host_id
)

select * from host_agg

Key patterns:

  • Aggregate from enriched intermediate layer
  • Use max() to select representative values
  • Include business metrics (counts, averages)

Monthly Aggregation Pattern

with daily_facts as (
    select * from {{ ref('fct_listing_calendar') }}
)

select
    listing_id,
    to_char(calendar_date, 'YYYY-MM') as year_month,
    count(*) as total_days,
    sum(case when available then 1 else 0 end) as available_days,
    sum(case when not available then 1 else 0 end) as unavailable_days,
    round(avg(case when available then 1.0 else 0.0 end), 2) as availability_rate,
    round(sum(estimated_revenue), 2) as estimated_revenue,
    round(avg(price), 2) as avg_price
from daily_facts
group by listing_id, to_char(calendar_date, 'YYYY-MM')

Key patterns:

  • Use to_char(date, 'YYYY-MM') for month grouping in Snowflake
  • Calculate rates with avg(case when condition then 1.0 else 0.0 end)
  • Aggregate revenue as sum, prices as average

Troubleshooting

dbt Connection Issues

Error: Database Error in model [...] (...) 250001 (08001): Failed to connect to DB

Solution:

  1. Verify profiles.yml has correct Snowflake account identifier
  2. Test connection: dbt debug --profiles-dir .
  3. Check Snowflake credentials and network access
  4. Ensure warehouse is running

Incremental Model Not Updating

Error: New data not appearing in incremental fact table

Solution:

# Force full rebuild
dbt run --full-refresh --select fct_listing_calendar --profiles-dir .

Check unique_key matches grain in config:

{{
    config(
        unique_key=['listing_id', 'calendar_date']  -- Must match table grain
    )
}}

Test Failures on Price Data

Error: dbt_utils.expression_is_true fails on price column

Solution: Raw price data may contain non-numeric values or currency symbols.

Clean in staging layer:

-- Remove $ and commas, cast to numeric
replace(replace(price, '$', ''), ',', '')::decimal(10,2) as price

Streamlit Connection Timeout

Error: OperationalError: 250001 (08001): Failed to connect

Solution:

  1. Check config/local_credentials.json credentials
  2. Verify Snowflake warehouse is running
  3. Add timeout config:
conn = snowflake.connector.connect(
    ...,
    login_timeout=30,
    network_timeout=30
)

Missing Stage Files

Error: File not found when running loader script

Solution:

  1. Verify raw files exist in data/raw/
  2. Check file names match loader script expectations
  3. Ensure files are compressed (.gz) where expected

dbt Model Dependency Errors

Error: Compilation Error: Model 'X' depends on a node named 'Y' which was not found

Solution:

  1. Check {{ ref('model_name') }} matches actual model file name
  2. Run dbt deps --profiles-dir . to install packages
  3. Verify model exists in models/ directory

Environment Variables for Production

Use environment variables instead of local credential files:

dbt profiles.yml:

airbnb_snowflake:
  target: prod
  outputs:
    prod:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: "{{ env_var('SNOWFLAKE_ROLE') }}"
      database: "{{ env_var('SNOWFLAKE_DATABASE') }}"
      warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}"
      schema: ANALYTICS
      threads: 4

Streamlit connection:

import os

conn = snowflake.connector.connect(
    account=os.getenv('SNOWFLAKE_ACCOUNT'),
    user=os.getenv('SNOWFLAKE_USER'),
    password=os.getenv('SNOWFLAKE_PASSWORD'),
    role=os.getenv('SNOWFLAKE_ROLE'),
    warehouse=os.getenv('SNOWFLAKE_WAREHOUSE'),
    database=os.getenv('SNOWFLAKE_DATABASE'),
    schema='ANALYTICS'
)

Project Resources

  • GitHub: analyticsdurgesh/Snowflake_DBT_Project
  • Inside Airbnb: insideairbnb.com
  • dbt Docs: docs.getdbt.com
  • Snowflake Docs: docs.snowflake.com

Featured

QwikClaw — one-click deploy OpenClaw logoQwikClaw — one-click deploy OpenClaw

Your own always-on OpenClaw agent, live in 60 seconds. No server, no setup — pick a model, connect Telegram, done.

Deploy your agent →
SetupClaw: done-for-you OpenClaw for founders & exec teams logoSetupClaw: done-for-you OpenClaw for founders & exec teams

White-glove OpenClaw for founders and exec teams (4–50+ employees): we install, harden, integrate your tools, and maintain it — secured from day one.

Get it set up for you →
MoltAwards - Agent internet for government contracts + jobs. logoMoltAwards - Agent internet for government contracts + jobs.

MoltAwards is an agent-native social layer for matchawards.com.

Learn more →
CLN.Work — Stop prompting, start hiring AI employees logoCLN.Work — Stop prompting, start hiring AI employees

Turn your Claude agents into a real team — onboard them, assign tasks, and manage them like staff.

Hire AI employees →
Deploy your own AI agent logoDeploy your own AI agent

Launch OpenClaw or Hermes on Hostinger in about 60 seconds, keep your agent live 24/7, earn 20%-40% on your next referral up to $25-$45, and give your friend 20% off.

Launch on Hostinger →
Build the next $50K/mo OpenClaw wrapper logoBuild the next $50K/mo OpenClaw wrapper

Founders are earning with OpenClaw wrappers. Get the whole stack — auth, billing, deploy — and ship today, not in 3 months.

See the kit →
View on GitHub

Recommended skills

Browse all →

find-skills

vercel-labs/skills

2.3M installsInstall

frontend-design

anthropics/skills

617K installsInstall

vercel-react-best-practices

vercel-labs/agent-skills

520K installsInstall

agent-browser

vercel-labs/agent-browser

505K installsInstall

grill-me

mattpocock/skills

439K installsInstall

web-design-guidelines

vercel-labs/agent-skills

432K installsInstall

Browse

Skills by category

Frontend250Git198Data154Testing120Design105Docs103Security96Automation87Backend76Devops37Productivity29Mcp23

Advertise on Remote OpenClaw

Get your AI tool in front of 67,000+ AI enthusiasts a month

See placements & pricing →

Remote OpenClaw

AI agent skills directory, marketplace, and workflow hub for OpenClaw, Hermes Agent, Claude Code, Codex, and MCP-powered operator stacks.

Explore

  • Home
  • Skills Directory
  • Claude Code Skills
  • Codex Skills
  • Marketplace
  • Hermes Ecosystem
  • Agents
  • Guide
  • Learn
  • Blog

More

  • Playbook
  • Free Tools
  • Shipping
  • Contact
  • Terms
  • Privacy
© 2026 Remote OpenClaw
Fazier badgeFeatured on Twelve ToolsFeatured on Wired BusinessListed on Turbo0