pawrly

Functions

Pawrly sources expose data as tables (SELECT * FROM github.pulls). But many useful operations are call-shaped rather than table-shaped: a search endpoint parameterized by a query string, a lookup that takes an input, a glob over a pattern. Table-valued functions model these: named, reusable operations with ordered, typed arguments that return rows.

SELECT i.title, i.state
FROM github.search_issues('is:open label:bug', 50) AS i
WHERE i.state = 'open';

Functions are either builtin (shipped with Pawrly) or declared in pawrly.yaml. A declared function is either attached to a source (inheriting its connection and auth) or standalone (with its own namespace, kind, and config).


Calling functions from SQL

-- simple call
SELECT * FROM github.search_issues('is:open label:bug', 50);

-- alias, join, post-filter (WHERE applies on top of the function result)
SELECT i.title
FROM github.search_issues('is:open', 100) AS i
JOIN github.pulls p ON p.number = i.number
WHERE i.user_login <> 'dependabot[bot]';

-- CTEs / subqueries / UNION branches all work
WITH hot AS (SELECT * FROM github.search_issues('label:p0', 20))
SELECT count(*) FROM hot;

-- builtin
SELECT file_name, size_bytes FROM file.glob('./data/*.parquet') ORDER BY 1;

Rules:

  • Positional arguments only, matched against the declared order. Trailing optional/defaulted arguments may be omitted.
  • Arguments must be literals (strings, numbers, booleans, including negatives). Column references and expressions are rejected at plan time, a DataFusion constraint on table-function args. Use the CLI / MCP named-arg forms when you want name=value ergonomics.
  • ${param:KEY} substitution runs before the call is resolved, so parameterized queries can feed function arguments.
  • A bare two-part name without parentheses (FROM github.issues) is always an ordinary schema.table lookup, never a function.

Builtins

Function Signature Returns
file.glob file.glob(pattern varchar) One row per matched file: path, file_name, size_bytes, modified. Relative patterns resolve against the workspace dir; ~ expands to home. Zero matches → zero rows.
file.grep file.grep(pattern varchar, glob varchar) One row per line matching the regex pattern across the files matched by glob: path, line_number, line. Binary/non-UTF-8 files are skipped.
http.get http.get(url varchar, path varchar = '$') Generic GET; path is a JSONPath into the response. Each matched element is returned as a JSON string in the single body column.

Declaring functions

Source-attached (inherits the source's config)

sources:
  - name: github
    kind: http
    config:
      base_url: https://api.github.com
      auth:
        type: header
        headers:
          - { name: Authorization, bearer: "${secret:GITHUB_TOKEN}" }
    functions:
      - name: search_issues
        description: Search issues with the GitHub search syntax.
        endpoint: /search/issues          # {arg} placeholders substitute; bare args → ?q=...
        args:
          - { name: q,     type: varchar, required: true }
          - { name: limit, type: int,     default: "50" }
        response:
          path: $.items
        pagination:
          type: page
          param: page
        returns:
          - { name: number,     type: bigint }
          - { name: title,      type: varchar }
          - { name: state,      type: varchar }
          - { name: user_login, type: varchar, source: $.user.login }
          - { name: q,          type: varchar, source: arg }   # echo the bound argument

Call it (the namespace is the source name):

SELECT number, title, user_login
FROM github.search_issues('is:open label:bug repo:org/repo', 100)
WHERE state = 'open';

Attached functions omit namespace, kind, and config; all are inherited. They are only valid on http, mcp, and file sources, and may not share a name with a table in the same source.

An MCP function attaches the same way and uses the same tool-mapping vocabulary as a declarative MCP table:

sources:
  - name: linear
    kind: mcp
    config:
      transport: streamable_http
      url: https://mcp.linear.app/mcp
      auth:
        type: header
        headers: [{ name: Authorization, bearer: "${secret:LINEAR_TOKEN}" }]
    functions:
      - name: search_issues
        tool: search                       # MCP tool to invoke
        tool_args: { state: open }         # static tool arguments
        args:
          - { name: q,     type: varchar, required: true, tool_arg: query }
          - { name: limit, type: int,     default: "25" }
        rows_path: [issues]
        pagination: { cursor_arg: cursor, response_cursor_path: [nextCursor] }
        returns:
          - { name: key,   type: varchar }
          - { name: title, type: varchar, source: $.fields.title }

Call it (the q argument is sent to the tool as query):

SELECT key, title FROM linear.search_issues('priority:high', 25);

Standalone (explicit namespace + config)

functions:
  - name: geocode
    namespace: geo
    kind: http
    config:
      base_url: https://nominatim.openstreetmap.org
    endpoint: /search?format=json&q={address}
    args:
      - { name: address, type: varchar, required: true }
    returns:
      - { name: lat,          type: double,  source: $.lat }
      - { name: lon,          type: double,  source: $.lon }
      - { name: display_name, type: varchar }

  - name: logs
    namespace: ops
    kind: file
    path: ./logs/{service}/*.jsonl
    args:
      - { name: service, type: varchar, required: true }
    returns:
      - { name: path,       type: varchar }
      - { name: file_name,  type: varchar }

Call them (each uses its declared namespace):

SELECT display_name, lat, lon FROM geo.geocode('Eiffel Tower, Paris');

-- the {service} placeholder is filled from the call argument
SELECT file_name FROM ops.logs('api') ORDER BY file_name;

Reference

Common fields (both shapes):

Field Required Notes
name yes Function name; a valid SQL identifier, no __.
namespace standalone only SQL qualifier; inherited from the source name when attached.
kind standalone only http | mcp | file; inherited from the source kind when attached.
description, wiki, examples no Documentation surfaced by describe and the MCP describe_function tool.
args no Ordered argument declarations; list order is the positional call order.
returns yes Output columns; non-empty. The schema is fixed at plan time.
config standalone only Connection block, same shape as the matching source kind's config.

Each args entry:

Field Default Notes
name Valid SQL identifier, no __.
type varchar Column-vocabulary type (int, bigint, double, bool, timestamp, …).
required false Mutually exclusive with default; required args must precede optional/defaulted ones.
default Value used when the call omits this trailing arg.
tool_arg mcp only: wire name of the tool argument when it differs from name.

Each returns column:

Field Notes
name, type Column name and column-vocabulary type.
source A JSONPath into each response row ($.user.login), or the literal arg to inject the bound call argument of the same name.
description Optional.

Kind-specific body:

Kind Fields Notes
http endpoint, method, headers, body, response.path, pagination An arg that appears as a {placeholder} in endpoint/body substitutes there; otherwise it is sent as a query parameter. Standalone needs config.base_url unless endpoint is absolute.
mcp tool, tool_args, rows_path, pagination (cursor_arg, response_cursor_path), limit_binding Each arg may set tool_arg to rename its wire argument. Standalone needs a transport + command/url connection block.
file path (glob with {arg} placeholders) Declared file functions return file metadata, with the fixed schema path, file_name, size_bytes, modified. Line-content search is the builtin file.grep.

Reserved namespaces (http, file, mcp, materialized) and the __ separator can't be used in function / namespace / argument names.


CLI

pawrly function list [--json]
pawrly function describe <ns.name> [--json]
pawrly function call <ns.name> [ARG]... [--arg name=value]... [--limit N] [--format table|json|csv|ndjson]

call fetches the declaration, orders the literals (numeric/bool unquoted, strings escaped), composes SELECT * FROM ns.name(...), and runs it through the normal query path.


MCP tools

Tool Input Output
list_functions { functions: [{namespace, name, kind, builtin, signature, description}] }
describe_function { function: "ns.name" } Full spec: args, returns, examples
call_function { function, args: {name: value, ...}, max_rows } { columns, rows, row_count, truncated }

call_function composes SQL with the same renderer as the CLI and executes through the engine's query path.


Notes & current limits

  • Functions are live in v1: calls are not cached (they don't participate in the Parquet snapshot cache).
  • WHERE filters apply on top of the function result (the parameters come from the call args, not from filter pushdown).
  • An attached function shares its parent source's live connection (one rate-limiter / one MCP session), so it inherits the source's auth and quota state rather than opening a parallel client.