Introduction
Pawrly gives you a single SQL interface over heterogeneous data: local files (Parquet, CSV, JSON), SQLite databases, REST APIs (e.g. GitHub), and OpenAI-compatible AI models — all joinable in one statement, all served from one config file. No ETL pipelines, no warehouse to stand up, no learning a different query language per source.
It's a single Rust binary, pawrly, that is also embeddable as a library. Under the hood:
- DataFusion plans and executes every query — you write one SQL dialect.
- DuckDB (in-memory) acts as a sub-engine for sources DuckDB already speaks.
- HTTP and AI sources are native query providers, so a REST API or a model call is just another table or function in your SQL.
- Caching is opt-in per table and writes Parquet + a JSON manifest to disk, so it survives restarts and is shared safely between processes.
Pawrly is built for two audiences:
- Data engineers who want SQL over APIs and files without scheduling extracts or running a warehouse.
- AI agents that need a deterministic, audited query surface. Pawrly ships a first-class MCP server so assistants can query the same workspace your CLI uses.
The same engine is reachable three ways: in-process (the default), over a local daemon (pawrly serve), or over the network — and every frontend produces identical results.
Installation
Tested on macOS (Apple Silicon and Intel) and Linux (x86_64, aarch64).
Install a prebuilt binary (recommended)
curl -fsSL https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.sh | shThis installs the pawrly binary to ~/.local/bin (override with PAWRLY_INSTALL_DIR).
It detects your OS/arch, verifies the SHA-256 checksum, and falls back to building
from source with cargo if no prebuilt binary matches your platform.
Pin a version or change where it lands:
curl -fsSL https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.sh \
| PAWRLY_VERSION=v0.1.0 PAWRLY_INSTALL_DIR=/usr/local/bin shOn Windows (PowerShell):
irm https://raw.githubusercontent.com/CITGuru/pawrly/main/scripts/install.ps1 | iexWith Cargo, straight from source:
cargo install --git https://github.com/CITGuru/pawrly pawrly-cliBuild from source
Build the full workspace with Cargo when you want to hack on Pawrly itself.
Prerequisites
- Rust ≥ 1.85 with the 2024 edition. The repository pins the toolchain, so
rustupinstalls the right version automatically the first time you runcargo. - A C/C++ toolchain (DuckDB builds from source):
- macOS:
xcode-select --install - Debian/Ubuntu:
sudo apt-get install build-essential pkg-config libssl-dev cmake - Fedora:
sudo dnf install @development-tools openssl-devel cmake
- macOS:
git.
Build
git clone https://github.com/CITGuru/pawrly.git
cd pawrly
cargo build --workspace --releaseThe binary lands at ./target/release/pawrly. Add ./target/release to your PATH, or invoke the binary directly. The rest of the docs assume pawrly is on your PATH.
Quickstart
1. Run a query with no setup
Start with the engine itself — no sources, no network, no config:
pawrly sql "SELECT 1 AS hello"You get a single-row table back. With no pawrly.yaml in the current directory, Pawrly runs against an empty workspace — enough to exercise the SQL engine end-to-end without credentials.
2. Query local files
Create a tiny dataset:
mkdir -p data
cat > data/customers.csv <<'CSV'
id,name,plan
1,Acme Corp,enterprise
2,Globex,starter
3,Initech,growth
CSV
cat > data/orders.csv <<'CSV'
id,customer_id,amount_cents
100,1,49900
101,1,12000
102,2,2900
103,3,15000
104,3,15000
CSVDrop a pawrly.yaml in the same directory:
version: 1
name: quickstart
sources:
- name: data
kind: file
tables:
- name: customers
path: ./data/customers.csv
format: csv
- name: orders
path: ./data/orders.csv
format: csvNow join across both files in one statement:
pawrly sql "
SELECT c.name,
c.plan,
COUNT(o.id) AS order_count,
SUM(o.amount_cents)/100 AS total_dollars
FROM data.customers c
LEFT JOIN data.orders o ON o.customer_id = c.id
GROUP BY c.name, c.plan
ORDER BY total_dollars DESC
"Acme comes out on top with two orders totalling 619. Swap format: parquet and point path at a .parquet file and the SQL stays identical.
Two more commands you'll use constantly:
pawrly schema # list every table the workspace knows about
pawrly validate # sanity-check pawrly.yaml without running anything3. (Optional) Run as a daemon
For faster CLI invocations, start the local daemon once; subsequent commands auto-discover it over a Unix socket and skip engine warm-up:
pawrly serve & # background daemon
pawrly status # confirms it's up
pawrly sql "SELECT COUNT(*) FROM data.orders" # auto-discovers the daemonLocal mode and daemon mode return identical output by design.
4. (Optional) Open the Console
Prefer a browser? Launch the web Console for the same workspace — browse sources, the catalog, and semantic models, and run SQL with live-streaming results:
pawrly console # → http://127.0.0.1:8787It's read-only and binds loopback by default (no token needed).
Where to next
- Add more sources — see Sources.
- Shape
pawrly.yaml— see Configuration. - Define business models for humans and agents — see Semantic layer.
- Connect an AI assistant — see MCP server.
- Browse and query in the browser — see Console.
- Full command reference — see CLI.