Skip to content

Quickstart

This walkthrough starts from scratch with a public weather dataset and then shows the same Semduck flow through both the CLI and the Python API.

If you are using dbt-duckdb, skip this page and go to the dbt interface guide. The dbt setup and query flow are different enough that they are documented separately.

It uses Vega Datasets weather.csv, which contains daily weather observations for Seattle and New York. Source: weather.csv, dataset metadata

1. Install And Download The Dataset

pip install semduck
curl -L https://cdn.jsdelivr.net/npm/vega-datasets@3.2.1/data/weather.csv -o weather.csv

If you prefer uv, install Semduck with:

uv add semduck

2. Create A DuckDB Database

Use DuckDB through Python to create a local database file and load the CSV into a table named weather_raw:

import duckdb

conn = duckdb.connect("weather.duckdb")
conn.execute(
    """
    create or replace table weather_raw as
    select *
    from read_csv(?, auto_detect=true)
    """,
    ["weather.csv"],
)
conn.close()

Semduck also needs its registry schema initialized in that same database before you load any semantic definitions.

semduck init --db weather.duckdb
import duckdb
from semduck import init_registry

conn = duckdb.connect("weather.duckdb")
init_registry(conn)
conn.close()

3. Define The Semantic View

You can define the same semantic view in either YAML or semantic DDL.

Create a file named weather_semantic.yaml with:

name: weather
description: Daily weather observations grouped by location and weather type
tables:
  - name: weather
    base_table:
      table: weather_raw
    time_dimensions:
      - name: date
        expr: date
        data_type: date
    dimensions:
      - name: location
        expr: location
        data_type: varchar
      - name: weather
        expr: weather
        data_type: varchar
    facts:
      - name: precipitation
        expr: precipitation
        data_type: double
      - name: temp_max
        expr: temp_max
        data_type: double
      - name: temp_min
        expr: temp_min
        data_type: double
      - name: wind
        expr: wind
        data_type: double
    metrics:
      - name: day_count
        expr: count(*)
        description: Number of daily observations
      - name: avg_temp_max
        expr: avg(temp_max)
        description: Average maximum daily temperature
      - name: total_precipitation
        expr: sum(precipitation)
        description: Total precipitation

Create a file named weather_semantic.sql with:

create semantic view weather as
table weather_raw as weather
  time_dimensions (
    date as date type date
  )
  dimensions (
    location as location type varchar,
    weather as weather type varchar
  )
  facts (
    precipitation as precipitation type double,
    temp_max as temp_max type double,
    temp_min as temp_min type double,
    wind as wind type double
  )
  metrics (
    count(*) as day_count description 'Number of daily observations',
    avg(temp_max) as avg_temp_max description 'Average maximum daily temperature',
    sum(precipitation) as total_precipitation description 'Total precipitation'
  );

The weather_raw table name in both versions comes from step 2.

4. Load The Semantic View

Load the definition file you created into the initialized database:

semduck load --db weather.duckdb --file weather_semantic.yaml
import duckdb
from semduck import load_semantic_yaml_file

conn = duckdb.connect("weather.duckdb")
load_semantic_yaml_file(conn, "weather_semantic.yaml")
conn.close()
semduck load --db weather.duckdb --format ddl --file weather_semantic.sql
import duckdb
from semduck import load_semantic_ddl_file

conn = duckdb.connect("weather.duckdb")
load_semantic_ddl_file(conn, "weather_semantic.sql")
conn.close()

The CLI supports --format auto|yaml|ddl for semantic definition files. In auto mode, Semduck infers the format from the file extension or the first non-empty line. The default mode is auto.

5. Compile A Request

The compile step allows you to identify if a given request is valid without executing the query in the database.

semduck compile --db weather.duckdb --request "weather dimensions location, weather metrics day_count, avg_temp_max"
import duckdb
from semduck import compile_request_sql

conn = duckdb.connect("weather.duckdb")
sql = compile_request_sql(
    conn,
    "weather dimensions location, weather metrics day_count, avg_temp_max",
)
print(sql)
conn.close()

6. Execute A Request

Use query to retrieve data.

semduck query --db weather.duckdb --request "weather dimensions location, weather metrics day_count, avg_temp_max"
import duckdb
from semduck import execute_request

conn = duckdb.connect("weather.duckdb")
result = execute_request(
    conn,
    "weather dimensions location, weather metrics day_count, avg_temp_max",
)
print(result.fetchall())
conn.close()

7. Try A Derived Time Dimension

Once the basic query works, try a derived dimension:

semduck query --db weather.duckdb --request "weather dimensions date_trunc('month', date) as month, location metrics total_precipitation"
import duckdb
from semduck import execute_request

conn = duckdb.connect("weather.duckdb")
result = execute_request(
    conn,
    "weather dimensions date_trunc('month', date) as month, location metrics total_precipitation",
)
print(result.fetchall())
conn.close()

8. Use The Larger Example Project

The repository also includes a richer dbt-backed example in examples/dbt_example.

  • Standalone Python example against the checked-in dbt example database: packages/semduck/examples/query_existing_db.py
  • CLI wrapper around that database: packages/semduck/examples/query_existing_db_cli.sh
  • End-to-end dbt project: examples/dbt_example

Next Steps