Pawrly sources expose data as tables (SELECT * FROM github.pulls). But many useful operations are call-shaped rather than table-shaped: a search endpoint parameterized by a query string, a lookup that takes an input, a glob over a pattern. Table-valued functions model these: named, reusable operations with ordered, typed arguments that return rows.
SELECT i.title, i.state
FROM github.search_issues('is:open label:bug', 50) AS i
WHERE i.state = 'open';Functions are either builtin (shipped with Pawrly) or declared in pawrly.yaml. A declared function is either attached to a source (inheriting its connection and auth) or standalone (with its own namespace, kind, and config).
Calling functions from SQL
-- simple call
SELECT * FROM github.search_issues('is:open label:bug', 50);
-- alias, join, post-filter (WHERE applies on top of the function result)
SELECT i.title
FROM github.search_issues('is:open', 100) AS i
JOIN github.pulls p ON p.number = i.number
WHERE i.user_login <> 'dependabot[bot]';
-- CTEs / subqueries / UNION branches all work
WITH hot AS (SELECT * FROM github.search_issues('label:p0', 20))
SELECT count(*) FROM hot;
-- builtin
SELECT file_name, size_bytes FROM file.glob('./data/*.parquet') ORDER BY 1;Rules:
- Positional arguments only, matched against the declared order. Trailing optional/defaulted arguments may be omitted.
- Arguments must be literals (strings, numbers, booleans, including negatives). Column references and expressions are rejected at plan time, a DataFusion constraint on table-function args. Use the CLI / MCP named-arg forms when you want name=value ergonomics.
${param:KEY}substitution runs before the call is resolved, so parameterized queries can feed function arguments.- A bare two-part name without parentheses (
FROM github.issues) is always an ordinaryschema.tablelookup, never a function.
Builtins
| Function | Signature | Returns |
|---|---|---|
file.glob |
file.glob(pattern varchar) |
One row per matched file: path, file_name, size_bytes, modified. Relative patterns resolve against the workspace dir; ~ expands to home. Zero matches → zero rows. |
file.grep |
file.grep(pattern varchar, glob varchar) |
One row per line matching the regex pattern across the files matched by glob: path, line_number, line. Binary/non-UTF-8 files are skipped. |
http.get |
http.get(url varchar, path varchar = '$') |
Generic GET; path is a JSONPath into the response. Each matched element is returned as a JSON string in the single body column. |
Declaring functions
Source-attached (inherits the source's config)
sources:
- name: github
kind: http
config:
base_url: https://api.github.com
auth:
type: header
headers:
- { name: Authorization, bearer: "${secret:GITHUB_TOKEN}" }
functions:
- name: search_issues
description: Search issues with the GitHub search syntax.
endpoint: /search/issues # {arg} placeholders substitute; bare args → ?q=...
args:
- { name: q, type: varchar, required: true }
- { name: limit, type: int, default: "50" }
response:
path: $.items
pagination:
type: page
param: page
returns:
- { name: number, type: bigint }
- { name: title, type: varchar }
- { name: state, type: varchar }
- { name: user_login, type: varchar, source: $.user.login }
- { name: q, type: varchar, source: arg } # echo the bound argumentCall it (the namespace is the source name):
SELECT number, title, user_login
FROM github.search_issues('is:open label:bug repo:org/repo', 100)
WHERE state = 'open';Attached functions omit namespace, kind, and config; all are inherited. They are only valid on http, mcp, and file sources, and may not share a name with a table in the same source.
An MCP function attaches the same way and uses the same tool-mapping vocabulary as a declarative MCP table:
sources:
- name: linear
kind: mcp
config:
transport: streamable_http
url: https://mcp.linear.app/mcp
auth:
type: header
headers: [{ name: Authorization, bearer: "${secret:LINEAR_TOKEN}" }]
functions:
- name: search_issues
tool: search # MCP tool to invoke
tool_args: { state: open } # static tool arguments
args:
- { name: q, type: varchar, required: true, tool_arg: query }
- { name: limit, type: int, default: "25" }
rows_path: [issues]
pagination: { cursor_arg: cursor, response_cursor_path: [nextCursor] }
returns:
- { name: key, type: varchar }
- { name: title, type: varchar, source: $.fields.title }Call it (the q argument is sent to the tool as query):
SELECT key, title FROM linear.search_issues('priority:high', 25);Standalone (explicit namespace + config)
functions:
- name: geocode
namespace: geo
kind: http
config:
base_url: https://nominatim.openstreetmap.org
endpoint: /search?format=json&q={address}
args:
- { name: address, type: varchar, required: true }
returns:
- { name: lat, type: double, source: $.lat }
- { name: lon, type: double, source: $.lon }
- { name: display_name, type: varchar }
- name: logs
namespace: ops
kind: file
path: ./logs/{service}/*.jsonl
args:
- { name: service, type: varchar, required: true }
returns:
- { name: path, type: varchar }
- { name: file_name, type: varchar }Call them (each uses its declared namespace):
SELECT display_name, lat, lon FROM geo.geocode('Eiffel Tower, Paris');
-- the {service} placeholder is filled from the call argument
SELECT file_name FROM ops.logs('api') ORDER BY file_name;Reference
Common fields (both shapes):
| Field | Required | Notes |
|---|---|---|
name |
yes | Function name; a valid SQL identifier, no __. |
namespace |
standalone only | SQL qualifier; inherited from the source name when attached. |
kind |
standalone only | http | mcp | file; inherited from the source kind when attached. |
description, wiki, examples |
no | Documentation surfaced by describe and the MCP describe_function tool. |
args |
no | Ordered argument declarations; list order is the positional call order. |
returns |
yes | Output columns; non-empty. The schema is fixed at plan time. |
config |
standalone only | Connection block, same shape as the matching source kind's config. |
Each args entry:
| Field | Default | Notes |
|---|---|---|
name |
— | Valid SQL identifier, no __. |
type |
varchar |
Column-vocabulary type (int, bigint, double, bool, timestamp, …). |
required |
false |
Mutually exclusive with default; required args must precede optional/defaulted ones. |
default |
— | Value used when the call omits this trailing arg. |
tool_arg |
— | mcp only: wire name of the tool argument when it differs from name. |
Each returns column:
| Field | Notes |
|---|---|
name, type |
Column name and column-vocabulary type. |
source |
A JSONPath into each response row ($.user.login), or the literal arg to inject the bound call argument of the same name. |
description |
Optional. |
Kind-specific body:
| Kind | Fields | Notes |
|---|---|---|
http |
endpoint, method, headers, body, response.path, pagination |
An arg that appears as a {placeholder} in endpoint/body substitutes there; otherwise it is sent as a query parameter. Standalone needs config.base_url unless endpoint is absolute. |
mcp |
tool, tool_args, rows_path, pagination (cursor_arg, response_cursor_path), limit_binding |
Each arg may set tool_arg to rename its wire argument. Standalone needs a transport + command/url connection block. |
file |
path (glob with {arg} placeholders) |
Declared file functions return file metadata, with the fixed schema path, file_name, size_bytes, modified. Line-content search is the builtin file.grep. |
Reserved namespaces (http, file, mcp, materialized) and the __ separator can't be used in function / namespace / argument names.
CLI
pawrly function list [--json]
pawrly function describe <ns.name> [--json]
pawrly function call <ns.name> [ARG]... [--arg name=value]... [--limit N] [--format table|json|csv|ndjson]call fetches the declaration, orders the literals (numeric/bool unquoted, strings escaped), composes SELECT * FROM ns.name(...), and runs it through the normal query path.
MCP tools
| Tool | Input | Output |
|---|---|---|
list_functions |
— | { functions: [{namespace, name, kind, builtin, signature, description}] } |
describe_function |
{ function: "ns.name" } |
Full spec: args, returns, examples |
call_function |
{ function, args: {name: value, ...}, max_rows } |
{ columns, rows, row_count, truncated } |
call_function composes SQL with the same renderer as the CLI and executes through the engine's query path.
Notes & current limits
- Functions are live in v1: calls are not cached (they don't participate in the Parquet snapshot cache).
WHEREfilters apply on top of the function result (the parameters come from the call args, not from filter pushdown).- An attached function shares its parent source's live connection (one rate-limiter / one MCP session), so it inherits the source's auth and quota state rather than opening a parallel client.