pawrly

MCP server

Pawrly ships a Model Context Protocol server so AI assistants — Claude Desktop, Cursor, Codex, and others — can query your workspace directly. The MCP server is a frontend: it runs the same engine as the CLI (in-process by default, or proxied to a pawrly serve daemon), so an agent sees exactly the data and semantic models you do.

Running it

pawrly mcp-stdio

This speaks MCP over stdio — the transport assistants launch as a subprocess. It honors the global engine-selection flags, so you can point it at a shared daemon:

pawrly mcp-stdio --remote uds:///path/to/pawrly.sock

Run several agents against one daemon and they share one engine and one cache.

Over HTTP

For network clients, run the HTTP transport instead:

pawrly mcp-http --addr 127.0.0.1:8090

It serves JSON-RPC at POST /mcp and a liveness probe at GET /healthz. Without a token it refuses to bind anything but a loopback address. To accept remote connections, require a bearer token — resolved from the config's secret backend or an environment variable of the same name:

pawrly mcp-http --addr 0.0.0.0:8090 --bearer-token-from MCP_TOKEN

Every /mcp request must then carry Authorization: Bearer <token>.

Connecting Claude Desktop

Add Pawrly to your MCP client config (for Claude Desktop, claude_desktop_config.json):

{
  "mcpServers": {
    "pawrly": {
      "command": "pawrly",
      "args": ["mcp-stdio", "--config", "/absolute/path/to/pawrly.yaml"]
    }
  }
}

Use an absolute path to the binary if pawrly isn't on the client's PATH. Other stdio-capable clients (Cursor, Codex, …) use the same command and args.

Tools

The server exposes these tools:

Tool Input Returns
query { sql, max_rows?, query_id? } { columns, rows, row_count, truncated }
cancel_query { query_id } { cancelled } — aborts an in-flight query with that id
list_sources {} the configured sources, their kinds, status, and table counts
list_tables { source? } the tables across configured sources
search_tables { query, source?, limit? } tables whose name or description match the keywords, ranked; { tables, match_count, truncated }
list_columns { table?, source?, name?, limit? } columns flattened one-per-row across tables; name greps column name/description; { columns, column_count, truncated }
describe_table { table } one table's columns, descriptions, pushdown affordances, examples, and agent-facing wiki notes
get_schema { sources?, compact? } a compact catalog overview for grounding an LLM
refresh_table { table } forces a cache refresh; returns rows written, size, and expiry
materialize { name, sql? | file? | url?, format?, params? } persists a named, self-backed table; { name, file_path, row_count, size_bytes }
drop_materialized { name } drops a materialized table; { dropped }
list_semantic_models {} the semantic models with dimension/measure counts
describe_semantic_model { name } one model's full spec — dimensions, measures, relationships
semantic_query a structured query (below) { columns, rows, row_count, truncated }

query

Run raw SQL. max_rows (default 1000) caps the rows returned. Pass a query_id so a concurrent cancel_query can abort a long-running scan.

{ "sql": "SELECT status, COUNT(*) FROM data.orders GROUP BY status", "max_rows": 100 }

cancel_query

Abort an in-flight query or semantic_query that was started with the same query_id. Returns { cancelled }false if no query with that id was running. The cancel arrives on a separate request, so it is effective over the HTTP transport (where a second connection can reach the server mid-query).

{ "query_id": "report-42" }

list_sources

List every configured source with its kind, connection status, and table count. Takes no arguments.

{}

list_tables

List tables across all sources, or limit to one with source. Each row carries the table's schema, name, kind, description, cache flag, and any required filters.

{ "source": "github" }

search_tables

Keyword discovery for large catalogs. Matches the query terms against table names and descriptions (case-insensitive; every term must appear), ranking name hits ahead of description-only hits. Returns { tables, match_count, truncated }. Reach for this before describe_table when a source has hundreds of tables.

{ "query": "pull request review", "source": "github", "limit": 20 }

list_columns

List columns flattened to one row per column — the column-level counterpart to list_tables. Scope with table (one table), source (one source), and/or name, a case-insensitive keyword over column name and description. Use name to find which tables expose a column like created_at or email. Returns { columns, column_count, truncated }.

{ "name": "created_at", "source": "github" }

describe_table

Full detail for one fully-qualified <schema>.<table>: column schema, pushdown affordances, example queries, and agent-facing wiki usage notes.

{ "table": "github.pulls" }

get_schema

A compact catalog overview for grounding an LLM in one call — every schema, its tables, and a one-line column list per table. Limit to named sources, or set compact: false for fuller detail.

{ "sources": ["github", "warehouse"], "compact": true }

refresh_table

Force an immediate cache refresh of a fully-qualified table. Only valid for tables with caching enabled; returns the rows written, size on disk, and next expiry.

{ "table": "github.pulls" }

materialize

Persist data as a named, self-backed table queryable as <namespace>.materialized.<name> (see materialize). Provide exactly one origin: sql (a query), file (a local CSV/Parquet/JSON path), or url (an http(s) file). Create-or-replace by name; the table is pinned and never auto-evicted. Returns { name, file_path, row_count, size_bytes }.

{ "name": "top_customers", "sql": "SELECT * FROM data.customers ORDER BY revenue DESC LIMIT 100" }

drop_materialized

Drop a materialized table by name. Returns { dropped }false if no such table existed.

{ "name": "top_customers" }

list_semantic_models

List the semantic-layer models with their dimension and measure counts. Takes no arguments.

{}

describe_semantic_model

Full spec for one model: its dimensions, measures, relationships, named segments (reusable filter sets you can pass in segments), and any required filters to satisfy up front.

{ "name": "orders" }

semantic_query

Run a structured query against the semantic layer — the recommended surface for agents, because models give them a curated business vocabulary instead of raw column names.

{
  "measures": ["orders.revenue"],
  "dimensions": ["orders.order_date.month", "orders.status"],
  "filters": [{ "member": "orders.status", "op": "equals", "values": ["paid"] }],
  "order_by": [{ "member": "orders.order_date.month", "direction": "asc" }],
  "limit": 100,
  "params": { "tenant_id": "acme" },
  "max_rows": 1000
}

params binds ${param:NAME} placeholders used by a model's row-level-security predicates. If a model requires a param and the agent omits it, the query is refused before any scan — so an agent can't accidentally read across tenants.

Grounding agents

The intended flow for an assistant:

  1. list_semantic_models to see what's available.
  2. describe_semantic_model to learn a model's dimensions, measures, and any required filters.
  3. semantic_query (or query for ad-hoc SQL) to get results.

Because describe_semantic_model advertises required filters and RLS params up front, an agent can satisfy them in the very next call.