Overview — Pawrly

Introduction

Pawrly gives you a single SQL interface over heterogeneous data: local files (Parquet, CSV, JSON), SQLite databases, REST APIs (e.g. GitHub), and OpenAI-compatible AI models — all joinable in one statement, all served from one config file. No ETL pipelines, no warehouse to stand up, no learning a different query language per source.

It's a single Rust binary, pawrly, that is also embeddable as a library. Under the hood:

DataFusion plans and executes every query — you write one SQL dialect.
DuckDB (in-memory) acts as a sub-engine for sources DuckDB already speaks.
HTTP and AI sources are native query providers, so a REST API or a model call is just another table or function in your SQL.
Caching is opt-in per table and writes Parquet + a JSON manifest to disk, so it survives restarts and is shared safely between processes.

Pawrly is built for two audiences:

Data engineers who want SQL over APIs and files without scheduling extracts or running a warehouse.
AI agents that need a deterministic, audited query surface. Pawrly ships a first-class MCP server so assistants can query the same workspace your CLI uses.

The same engine is reachable three ways: in-process (the default), over a local daemon (pawrly serve), or over the network — and every frontend produces identical results.

Installation

Tested on macOS (Apple Silicon and Intel) and Linux (x86_64, aarch64).

Install a prebuilt binary (recommended)

curl -fsSL https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.sh | sh

This installs the pawrly binary to ~/.local/bin (override with PAWRLY_INSTALL_DIR). It detects your OS/arch, verifies the SHA-256 checksum, and falls back to building from source with cargo if no prebuilt binary matches your platform.

Pin a version or change where it lands:

curl -fsSL https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.sh \
  | PAWRLY_VERSION=v0.1.0 PAWRLY_INSTALL_DIR=/usr/local/bin sh

On Windows (PowerShell):

irm https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.ps1 | iex

With Cargo, straight from source:

cargo install --git https://github.com/CITGuru/pawrly pawrly-cli

Build from source

Build the full workspace with Cargo when you want to hack on Pawrly itself.

Prerequisites

Rust ≥ 1.85 with the 2024 edition. The repository pins the toolchain, so rustup installs the right version automatically the first time you run cargo.
A C/C++ toolchain (DuckDB builds from source):
- macOS: xcode-select --install
- Debian/Ubuntu: sudo apt-get install build-essential pkg-config libssl-dev cmake
- Fedora: sudo dnf install @development-tools openssl-devel cmake
git.

Build

git clone https://github.com/CITGuru/pawrly.git
cd pawrly
cargo build --workspace --release

The binary lands at ./target/release/pawrly. Add ./target/release to your PATH, or invoke the binary directly. The rest of the docs assume pawrly is on your PATH.

Quickstart

1. Run a query with no setup

Start with the engine itself — no sources, no network, no config:

pawrly sql "SELECT 1 AS hello"

You get a single-row table back. With no pawrly.yaml in the current directory, Pawrly runs against an empty workspace — enough to exercise the SQL engine end-to-end without credentials.

2. Query local files

Create a tiny dataset:

mkdir -p data
cat > data/customers.csv <<'CSV'
id,name,plan
1,Acme Corp,enterprise
2,Globex,starter
3,Initech,growth
CSV

cat > data/orders.csv <<'CSV'
id,customer_id,amount_cents
100,1,49900
101,1,12000
102,2,2900
103,3,15000
104,3,15000
CSV

Drop a pawrly.yaml in the same directory:

version: 1
name: quickstart

sources:
  - name: data
    kind: file
    tables:
      - name: customers
        path: ./data/customers.csv
        format: csv
      - name: orders
        path: ./data/orders.csv
        format: csv

Now join across both files in one statement:

pawrly sql "
  SELECT c.name,
         c.plan,
         COUNT(o.id)             AS order_count,
         SUM(o.amount_cents)/100 AS total_dollars
  FROM data.customers c
  LEFT JOIN data.orders o ON o.customer_id = c.id
  GROUP BY c.name, c.plan
  ORDER BY total_dollars DESC
"

Acme comes out on top with two orders totalling 619. Swap format: parquet and point path at a .parquet file and the SQL stays identical.

Two more commands you'll use constantly:

pawrly schema      # list every table the workspace knows about
pawrly validate    # sanity-check pawrly.yaml without running anything

3. (Optional) Run as a daemon

For faster CLI invocations, start the local daemon once; subsequent commands auto-discover it over a Unix socket and skip engine warm-up:

pawrly serve &                                  # background daemon
pawrly status                                   # confirms it's up
pawrly sql "SELECT COUNT(*) FROM data.orders"   # auto-discovers the daemon

Local mode and daemon mode return identical output by design.

4. (Optional) Open the Console

Prefer a browser? Launch the web Console for the same workspace — browse sources, the catalog, and semantic models, and run SQL with live-streaming results:

pawrly console        # → http://127.0.0.1:8787

It's read-only and binds loopback by default (no token needed).

Where to next

Add more sources — see Sources.
Shape pawrly.yaml — see Configuration.
Define business models for humans and agents — see Semantic layer.
Connect an AI assistant — see MCP server.
Browse and query in the browser — see Console.
Full command reference — see CLI.