Skip to content

Rime

dbt-style pipelines across Python, R, JavaScript, and SQL. Declare your data retrievals and transforms in one file; Rime handles caching, logs, validation, outputs, and reports.

Rime is a runtime for reproducible data work. You declare the pipeline once in pipeline.dag.yaml. Rime runs the graph, caches each node, captures logs, validates outputs, and writes artifacts.

specification_version: "2.1"
nodes:
- id: raw_orders
kind: sql
source: queries/load_orders.sql
- id: order_metrics
kind: derive
inputs: [raw_orders]
as: revenue
expr: "[unit_price] * [quantity]"
- id: sales_chart
kind: python
source: scripts/plot_sales.py
in:
orders: order_metrics

Here, SQL imports data, the derive node computes one reviewable feature with Rime’s expression language, and Python graphs the result. Rime captures intermediate data and script side effects, then produces a report with a runtime overview like this.

A Rime DAG where a SQL node feeds a derive node, then a Python node.

⚡ Functions, not jobs

A node is a function over dataframes, not a task that wires I/O. You write what each step computes; the runtime owns reading, writing, serialization, and language boundaries. The dbt mental model, extended past SQL.

🧰 One DAG, four languages

SQL for joins, Python for ML, R for stats, JavaScript for everything else. Same pipeline, named slots, typed boundaries. Dataframes cross language borders through Arrow-backed payloads instead of ad hoc CSV handoffs.

🔒 Reproducible by default

Content-addressed caching, deterministic outputs, freeze-able snapshots. Same script plus same inputs means the same artifact, every time. No “works on my machine.”

📄 Publishable narratives

Render a publishable HTML report directly from your DAG. Tables, stats, stdout, figures, and node status: one render step, one document, one source of truth.

  • Airflow and Prefect orchestrate recurring jobs; Rime is local and one run.
  • Reads, writes, retries, errors, and persistence are usually coded inside tasks.
  • Rime owns dataframe handoff, execution order, caching, logs, validation, and outputs.
Airflow / Prefect
@task
def load_orders():
orders = read_sql("SELECT * FROM orders")
orders.to_parquet("outputs/raw_orders.parquet")
@task
def plot_sales():
orders = pd.read_parquet("outputs/raw_orders.parquet")
plot(orders)
@flow
def nightly_sales():
load_orders()
plot_sales()
Rime
nodes:
- id: raw_orders
kind: sql
source: queries/load_orders.sql
- id: sales_chart
kind: python
source: scripts/plot_sales.py
in:
orders: raw_orders

The Editor and the CLI both consume the same pipeline.dag.yaml. Start visually, drop to YAML, or run the same project in CI.