GS SQL - LLM Agent Reference (Compact Core)

This is the compact CORE for AI agents — only the GS-specific things that prevent failures and that the agent wouldn't know to search for. For everything else (function syntax, full DDL, deep grammar, catalog/view syntax, table function details, edge cases, recipes), use DocsRetrieval / GrepDocs / ReadDoc on the developer HTML docs (gsqlgrammar.html, gsqltables.html, gsqlteq.html, gsqlddl.html, gsql_llm_ddl_reference.html, help_*.html).

0. Relationship to ANSI SQL and PostgreSQL

GS SQL is an ANSI SQL / PostgreSQL-flavored dialect with temporal extensions. Treat your existing PostgreSQL knowledge as the prior — most things that work in PostgreSQL work in GS SQL with the same semantics. Use this doc only for the GS-specific things that won't behave the way you expect. Behaves like ANSI / PostgreSQL (no need to look up): - Operators: || (string concat), ::type (cast), +, -, *, /, %, =, !=, <>, <, <=, >, >=, AND, OR, NOT, IS NULL, IS NOT NULL, BETWEEN, IN, EXISTS, LIKE (case-sensitive), ILIKE (case-insensitive), ~ (POSIX regex, case-sensitive, PG semantics), CAST(expr AS type) - Keyword forms: EXTRACT(field FROM expr), CASE/WHEN/THEN/ELSE, COALESCE, NULLIF - Type names (22 first-class, PG-aligned). Canonical ↔ PG display name: BOOLEAN↔Boolean, SHORT↔Smallint, INTEGER↔Integer, LONG↔Bigint, FLOAT↔Real, DOUBLE↔"Double precision", BIG_DECIMAL↔Numeric, STRING↔Varchar, BYTE_ARRAY↔Bytea (raw bytes inline, 1 MB cap), FILE↔File (URL/handle to separately-stored bytes), DATE↔Date, TIME↔Time, TIMESTAMP↔Timestamp, DATETIME↔Timestamptz, INTERVAL↔Interval, UUID↔UUID, JSON↔JSON, JSONB↔JSONB, LATITUDE/LONGITUDE/ELEVATION/DIRECTION360 (GS domain, all DOUBLE-backed). Canonical names are persisted; PG display names show in the UI; both work in CAST / :: / CREATE TABLE / JSON_TABLE / CSV_TABLE. - Standard SQL functions: full standard library — character_length, length, position, strpos, instr, lower, upper, trim/ltrim/rtrim, btrim(str[,chars]) (alias of trim), substr, replace, overlay(src,substr,start[,length]) (1-based start, replaces `length` chars), concat, concat_ws, md5, sha256, lpad, rpad, initcap, split_part, repeat, gen_random_uuid (uuid_generate_v4), date_trunc, date_part, to_timestamp, to_date, regexp_match, regexp_replace, regexp_substr(str,pat[,start[,n[,flags]]]) (n-th match), regexp_split_to_array(str,pat[,flags]) (GS deviation: returns a JSON-array text, not PG text[] — GS has no first-class ARRAY type), count, sum, avg, min, max, var, stddev, percentile, string_agg, abs, round, floor, ceil, sqrt, power, log, ln, exp, atan2(y,x), cot(x), cbrt(x) (cube root), factorial(n) (BIG_DECIMAL, capped n=10000), scale(numeric) (fractional-digit count), min_scale(numeric), trim_scale(numeric) (strips trailing zeros), width_bucket(operand,low,high,count) (histogram bucket index), pg_typeof, now(), current_date, current_time, current_timestamp, clock_timestamp() (true wall-clock), statement_timestamp() (wall-clock at statement start), transaction_timestamp() (alias of now), isfinite(date|timestamp|interval) (TRUE for any non-NULL input), justify_days(interval), justify_hours(interval), justify_interval(interval). Also implemented (name = params): div(a,b), gcd(a,b), lcm(a,b), to_number(text,fmt), to_char(expr,fmt), make_date(y,m,d), make_time(h,m,s), make_timestamp(y,mo,d,h,mi,s), date_bin(stride,ts,origin), age(ts | end,start) (returns PG-style INTERVAL: calendar years/months/days/time; 1-arg anchors at current_date midnight), translate(str,from,to), chr(n), ascii(str), octet_length(str), format(fmt,...), quote_ident(str), quote_literal(str), current_database(), current_agentSchema(), current_user, session_user, format_type(oid,typmod), pg_encoding_to_char(id), pg_client_encoding() (returns 'UTF8'), version(), current_setting(name[,missing_ok]), set_config(name,value,is_local) (session GUC read/write — no-op writes today), has_table_privilege([user,] table, priv), has_column_privilege([user,] table, col, priv), has_database_privilege([user,] db, priv), has_schema_privilege([user,] schema, priv), has_any_column_privilege([user,] table, priv) — (2026-06-05 security audit) org owners return TRUE for all privileges; non-owner members return TRUE only for SELECT and REFERENCES (FALSE for INSERT/UPDATE/DELETE/TRUNCATE/TRIGGER/etc.); unknown relation/column raises PG-shape error. Per-user RBAC is enforced separately inside the engine — do NOT add WHERE clauses based on these scalars; rely on the engine. Was previously blanket-TRUE for everyone, to_regclass(text), to_regtype(text), to_regrole(text) (name → OID or NULL), pg_function_is_visible(oid), pg_type_is_visible(oid) (always TRUE), obj_description(oid[,catalog]), col_description(oid,col), shobj_description(oid,catalog), earth_distance(point1,point2 | lat1,lon1,lat2,lon2) (great-circle METERS, PostgreSQL earthdistance parity; the 2-arg form takes ll_to_earth values exactly like PG; NULL on any NULL input), earth_distance_km(lat1,lon1,lat2,lon2) (kilometers; for miles divide meters by 1609.344 — PG has no converter fn), ll_to_earth(lat,lon) (earth point value for earth_distance/latitude/longitude; transient expression value, not storable), latitude(point), longitude(point) (degrees back out of an earth point), earth() (the PG radius constant 6378168 m), sec_to_gc(m), gc_to_sec(m) (chord ↔ great-circle meters), earth_bearing(lat1,lon1,lat2,lon2) (initial great-circle compass bearing in degrees [0, 360); 0=N, 90=E, 180=S, 270=W), earth_midpoint_lat(lat1,lon1,lat2,lon2) and earth_midpoint_lon(lat1,lon1,lat2,lon2) (latitude and longitude of the great-circle midpoint; longitude normalized to (-180, 180]). Window functions (2026-05-28 v3): ranking — row_number, rank, dense_rank, percent_rank, cume_dist, ntile(n); offset — lag(col[, offset[, default]]), lead(col[, offset[, default]]); positional — first_value, last_value, nth_value(col, n); aggregate windows — sum, avg, count, min, max, stddev/stdev, stddevp/stdevp, variance/var, varp, bool_or, bool_and OVER (…). PG OVER clause supports PARTITION BY, ORDER BY (with NULLS FIRST/LAST), named WINDOW w AS (…), ROWS and RANGE frame modes with UNBOUNDED PRECEDING, expr PRECEDING/FOLLOWING (frame offsets can be arbitrary expressions, not just literals), CURRENT ROW, EXCLUDE {CURRENT ROW | GROUP | TIES | NO OTHERS}, and aggregate FILTER (WHERE pred). RANGE BETWEEN N PRECEDING is value-based (peer-aware), so on an ORDER BY time column it gives a true time-windowed aggregate. GROUPS frame mode with numeric offsets and the typed-literal INTERVAL form are not yet supported. - Clauses: SELECT (DISTINCT), FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT [N] [OFFSET M] (each independently optional — bare OFFSET is supported), WITH (CTEs incl. RECURSIVE), JOIN (INNER, LEFT/RIGHT/FULL OUTER, CROSS), UNION/UNION ALL/INTERSECT/EXCEPT, INSERT (column list optional — values map to template stream order; PK column doubles as row identifier), UPDATE, DELETE, standalone VALUES (e.g. VALUES (1, 2), (3, 4) ORDER BY 1 LIMIT 5), subqueries (correlated and non-correlated), CASE expressions, aggregate DISTINCT (e.g. COUNT(DISTINCT x)) - DDL: CREATE/ALTER/DROP TABLE, VIEW, MATERIALIZED VIEW - public. schema prefix on table refs is silently stripped If you're writing standard SQL — any of the above — trust your training. Search the docs only for the GS-specific things below.

1. Two Query Modes + GS-Specific Feature Inventory

TEQ (Temporal Entity Query) — Component template IDs as table names; stream IDs as columns; components as rows. Default returns lastValue per stream. Supports FK JOINs. Recommended when the org has templates. TDQ (Temporal Deep Query) — Query system.stream / system.active_event / system.user_notification / system.system_notification / system.job_notification directly. Use when no templates exist, when accessing system tables, or when modifying sample data at specific timestamps. Cannot mix TEQ and TDQ tables in the same query. Pick one mode per statement. GS-specific features beyond ANSI SQL — search the docs for syntax when you need them: - Per-column temporal parameters — e.g. kwh(range(last=100)), temperature(cycleId='hour', stat='avg'). Each stream column can have independent time semantics. THE primary GS-specific feature; covered in detail in §6. - Catalogs — multi-namespace layer (Snowflake/Databricks-style). Reference as catalog.table.column. DDL/syntax in gsql_llm_ddl_reference.html. - VIEW / MATERIALIZED VIEW — saved queries used as tables. Internal (over org data) or external (over a JDBC connection). Materialized views cache results; non-materialized re-run on each reference. Syntax in gsql_llm_ddl_reference.html. - EXTERNAL_QUERY(connectionId, remoteSql) — inline federated query against a JDBC database. Remote SQL is in the remote DB's native dialect, not GS SQL. Search gsqlgrammar.html for syntax. - JSON_TABLE / CSV_TABLE — parse JSON / CSV into rows. Search gsqlgrammar.html for syntax. - PG JSON operators / functions — j->'k', j->>'k', j#>'{a,b}', j#>>'{a,b}', a @> b (contains), a <@ b (contained in — inverse of @>), jsonb_typeof, jsonb_array_length, jsonb_extract_path[_text], json_build_object, json_build_array, to_json, to_jsonb, json_agg, jsonb_agg, json_object_agg(k,v), jsonb_object_agg(k,v), jsonb_set(j,path,val), jsonb_strip_nulls, jsonb_pretty, jsonb_contains. RHS keys must be literals. See “PG JSON access operators” row in gsqlgrammar.html. - FK JOINs — streams configured as foreign keys can be JOINed in TEQ. See §3. - Window temporal parameters — lag, lead, sliding window (slide + slidestat), cumulative (running) — operate directly on sample arrays. See §6e. Faster than SQL OVER(). - FK-resolved derivations — derived stream variables can resolve targets via SQL with automatic temporal segmentation across relationship-change boundaries. Configuration in gsql_llm_ddl_reference.html and help_fk_deps.html. - User-defined session variables — SET @var = expr or SET @var = (subquery). Session-scoped, case-insensitive. Search docs for full rules. - Virtual columns / virtual views — standard SQL inline expressions (SELECT col*1.1 AS x) and FROM-subqueries (FROM (SELECT ...) v). Both fully supported. - Cost-based query planner — per-org table statistics drive index/join choices. ANALYZE STATS DDL refreshes synchronously; auto-refresh on JDBC import and on a 30-min cycle. - AI agent functions — inline per-row LLM calls: agent_string/agent_boolean/agent_double/agent_long/agent_json/agent_jsonb (+ agent_*_agg aggregates; JEP agentString/… in derived streams). Typed output by function name, named options (model by profile Name/tier Brain|Minion/labels/schema/tools/timeout_ms), classification, structured output, opt-in tool loop, multimodal FILE inputs. See §10. (Publish-gated — dark until the SOC2 cluster is live.)

2. Conventions and Quirks

- Case insensitive: table names, column names, functions, reserved words. - Comments: -- single-line, /* */ multi-line. - Datetimes are epoch-millisecond Longs, not PG timestamp values. EXTRACT(EPOCH FROM ts) returns milliseconds, NOT PG-style seconds (divide by 1000 for PG-compatible behavior). - NOW keyword for current time. - Missing optional values are '' (empty string), not NULL. - System columns are underscore-prefixed snake_case: _component_name, _component_uid, _component_id, _component_created_date, _folder_path, _name, _uid, _id, _last_value, _start_date, _end_date, _stream_type, _template_uid, _sample, _time, _time_sd, _time_ed. - LIKE / ILIKE: LIKE is case-sensitive (ANSI/PG), ILIKE is case-insensitive (PG). Both accept */? in addition to %/_; GLOB is the same as ILIKE. - REGEXP: Java java.util.regex.Pattern syntax. String literals process backslash escapes — use '\\d+' (double backslash) for \d+, or character classes like '[0-9]+'. - Stream IDs that are SQL keywords must be quoted: "select", [order], "group". - No transactions. Each statement is its own atomic unit. BEGIN / COMMIT are accepted no-ops. ROLLBACK (bare) is rejected with an error — the engine raises “ROLLBACK is not supported: transactions are not yet implemented. Prior statements have already committed and cannot be rolled back.” Do NOT suggest ROLLBACK for error handling — recommend explicit DELETE / UPDATE reversal instead. SAVEPOINT, RELEASE, and ROLLBACK TO [SAVEPOINT] name are accepted no-ops. See §9. - GRANT / REVOKE / TRUNCATE raise loud errors (2026-06-05). Verbatim wording — GRANT/REVOKE: “GRANT/REVOKE is not supported via SQL — manage permissions via the org RBAC UI. The statement was rejected; permissions were NOT changed.” TRUNCATE: “TRUNCATE is not supported. Use DELETE FROM ... instead. No rows were removed.” Was silent (0 rows) before; that was a privilege-confusion footgun (users believed their grants had landed). For permissions, use the org RBAC surface; for bulk row removal, use DELETE FROM template (cascading delete). See §9. - INTERVAL field qualifiers (2026-06-05): INTERVAL '1-2' YEAR TO MONTH, INTERVAL '1 day' DAY TO SECOND are accepted. Qualifier set:

YEAR | MONTH | DAY | HOUR | MINUTE | SECOND | YEAR TO MONTH | DAY TO HOUR | DAY TO MINUTE | DAY TO SECOND | HOUR TO MINUTE | HOUR TO SECOND | MINUTE TO SECOND

. Parse-and-swallow: the qualifier is accepted but does not coerce or validate — the interval-string parser is the authority for the unit set and magnitude. Useful for PG/ANSI script paste; do not rely on the qualifier to constrain the parsed value. - BETWEEN SYMMETRIC / BETWEEN ASYMMETRIC (2026-06-05): x BETWEEN SYMMETRIC b AND c lowers to (x BETWEEN b AND c) OR (x BETWEEN c AND b) — the bounds can be given in either order. ASYMMETRIC is the default and can be written explicitly. Composes with NOT (NOT BETWEEN SYMMETRIC = neither in-order range nor reversed range). Example: WHERE 5 BETWEEN SYMMETRIC 10 AND 1 → TRUE. - IS [NOT] TRUE / IS [NOT] FALSE / IS [NOT] UNKNOWN (2026-06-05): full three-valued logic. NULL IS NOT TRUE → TRUE; NULL IS NOT FALSE → TRUE; NULL IS NOT UNKNOWN → FALSE. Useful when a boolean expression can be NULL and you want a NULL-safe predicate. - VALUES as a table constructor in FROM (2026-06-05): SELECT * FROM (VALUES (1, 'a'), (2, 'b')) AS t(id, label). Without an AS t(c1, c2, ...) alias, columns default to column1..columnN (PG semantics). All rows must have equal width — mismatched-width rows raise a parse error. Composes anywhere a table reference is allowed (JOIN, subquery, CTE, set ops). Distinct from standalone VALUES (a top-level query), already supported. - STRING_AGG with ORDER BY (2026-06-05, per #104): STRING_AGG(name, ',' ORDER BY priority DESC, name ASC). Multi-key ASC/DESC supported. Default null ordering: NULLS LAST on ASC, NULLS FIRST on DESC (PG semantics). The inner ORDER BY is the aggregate ordering — not the outer query ordering — and only applies to this aggregate call. - Window-frame EXCLUDE GROUP / EXCLUDE TIES are supported (have been since 2026-05-28 v3 window-functions ship); ignore older docs that said otherwise. - Window functions (2026-05-28 v3): full PG OVER (…) grammar — supported. Ranking (ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE), offset (LAG, LEAD), positional (FIRST_VALUE, LAST_VALUE, NTH_VALUE), and aggregate windows (SUM, AVG, COUNT, MIN, MAX, STDDEV/STDEV, STDDEVP/STDEVP, VARIANCE/VAR, VARP, BOOL_OR, BOOL_AND). PARTITION BY, ORDER BY with NULLS FIRST/LAST, named WINDOW w AS (…), ROWS AND RANGE frame modes with arbitrary-expression offsets (ROWS BETWEEN col PRECEDING works), EXCLUDE CURRENT ROW/GROUP/TIES/NO OTHERS, and aggregate FILTER (WHERE pred). Time-series sweet spot: RANGE BETWEEN N PRECEDING AND CURRENT ROW on an ORDER BY time column gives a true value-windowed rolling aggregate (peers on the same order-key share the same frame, unlike ROWS which is purely positional). Not yet supported: GROUPS frame mode with numeric offsets, typed INTERVAL '5 minutes' literals. For rollup-backed time-series math, prefer GS temporal parameters (sample(running=…) etc.); see §6e — same surface, different performance and semantics. - No IS DISTINCT FROM, no PG roles/grants. See §7. - public. schema prefix is silently stripped on table refs. - Named function arguments (2026): call by parameter name with name => value (modern) or name := value (legacy), e.g. LPAD(string => '5', length => 3, fill => '0'). Named args may be reordered and may follow positional args, but a positional arg may NOT follow a named one. Names are case-insensitive; the call is rewritten to positional order at parse time (identical result). GS declares parameter names for its built-in scalar functions — a superset of PostgreSQL, whose built-ins mostly lack declared names, so do not assume a named call that works here also works in stock PG. Variadic and aggregate/window functions (CONCAT, COALESCE, GREATEST, SUM, …) reject named args with “does not support named arguments”. Errors are loud: unknown name, duplicate name, positional-after-named, and unfilled-earlier-parameter all raise. - || precedence sits at the OR slot rather than between additive and IS — minor divergence; rarely matters. May 2026 changes you may not have in training: - || repurposed from logical OR to ANSI/PG string concat. Use AND/OR keywords for logical ops. - && no longer accepted as logical AND. Use AND. - LIKE is now case-sensitive (ANSI/PG). ILIKE added for case-insensitive matching. Saved queries are auto-rewritten to ILIKE so existing semantics are preserved. - :: postfix cast added (e.g. '42'::LONG, price::NUMERIC). - Additional parse-time aliases (coerce to the listed canonical, distinct from first-class TIMESTAMP/TIMESTAMPTZ/etc. above): TEXT/VARCHAR=STRING; BOOL=BOOLEAN; INT/INT8/BIGINT/INT4/INTEGER/INT2/SMALLINT=LONG (use canonical SHORT or INTEGER for narrow-width storage); FLOAT8/FLOAT4/REAL/"DOUBLE PRECISION"=DOUBLE (use canonical FLOAT for 4-byte storage); NUMERIC/DECIMAL=BIG_DECIMAL; BYTEA=BYTE_ARRAY; TIMESTAMPTZ=DATETIME. - EXTRACT(field FROM expr) keyword form (in addition to function-style date_part). - TEQ now supports full ANSI relational semantics over template tables (UNION/INTERSECT/EXCEPT, HAVING, subqueries, CASE, aggregate DISTINCT, all JOIN types).

3. TEQ - Template Table Queries

Template ID = table name. Stream IDs = columns. Each component = one row. Default returns lastValue per stream. Full ANSI relational semantics over template tables: JOINs, GROUP BY, HAVING, UNION/INTERSECT/EXCEPT, DISTINCT, subqueries (correlated and non-correlated), CASE, aggregate DISTINCT — all work normally. System columns (always available): _component_name, _component_uid, _component_id, _component_created_date, _folder_path. Stream column syntax: temperature -- returns lastValue s.temperature -- qualified with alias temperature(range(last=100)) -- with temporal params (returns multiple rows per component) s.temperature(range(sd=-1d)) -- alias + temporal params Mixed temporal mode: Each stream column has independent temporal parameters. Columns without params return lastValue. Columns with params return time-series data. Mixing is supported and common — the engine LEFT JOINs the sampled subquery with a lastValue pivot subquery. FK JOINs: Streams configured as foreign keys can JOIN template tables. SELECT m._component_name, c.region FROM meter m JOIN customer c ON m.customerUid = c._component_uid Examples: SELECT _component_name, kwh FROM meter SELECT _component_name, kwh(range(last=100)) FROM meter SELECT _component_name, kwh(range(sd=-1d)), voltage FROM meter -- mixed temporal SELECT _component_name, kwh(cycleId='day', stat='sum', range(currentCycle='year')) FROM meter SELECT c.region, AVG(m.kwh) FROM meter m JOIN customer c ON m.customerUid = c._component_uid GROUP BY c.region SELECT _component_name, kwh(running='sum', range(currentCycle='month')) FROM meter -- Catalogs (multi-namespace) SELECT _component_name, kwh FROM utility.meter

4. TDQ - System Tables

system.stream — primary system table. One row per stream, NOT one per component. Use DISTINCT _component_name when you want unique components. Indexed columns (prefer in WHERE for performance): _component_name, _component_uid, _component_id, _uid, _id, _name, _folder_path, _template_uid. Composite index: (_component_name, _name). Index operators: =, >, >=, <, <=, BETWEEN, LIKE 'prefix%', ILIKE 'prefix%', STARTSWITH('prefix'). String indexes are stored case-folded (lowercase), so case-sensitive LIKE / =, BETWEEN, etc. still use the index but read a small superset that the runtime filter narrows. ILIKE and STARTSWITH match the index encoding directly. Indexed string values are capped at 256 bytes for the index entry only — values longer than that still index and query correctly (the runtime always re-checks the full value); only their selectivity may be reduced. Other columns: description, timeZoneId, unitUid, _stream_type (intvl_stream/rdm_stream/point_stream), _derivation_type, baseCycleUid, rollupCalendarUid, timeFilterUid, defaultRollupMethod, _sample_value_type, _start_date (Long), _end_date (Long), completedDate (Long), lastUpdated (Long), _last_value (varies), _last_value_type, _component_created_date. Sample columns (apply temporal params here): _sample (varies), _time, _time_sd, _time_ed. Other system tables (read-only): system.active_event, system.user_notification, system.system_notification, system.job_notification. Search gsqltables.html for column schemas. DML on system.sample only. system.stream and notification tables are read-only. Performance: The cost-based planner uses per-org statistics. After bulk imports or large component additions, run ANALYZE STATS to refresh synchronously. Examples: SELECT DISTINCT _component_name FROM system.stream SELECT _component_name, _name, _last_value FROM system.stream WHERE _component_id = 'meter1' SELECT _sample(range(sd=-1d)) FROM system.stream WHERE _component_id='gen1' AND _id='kw' SELECT _sample(cycleId='month', stat='sum', range(currentCycle='year')) FROM system.stream WHERE _component_id='gen1' AND _id='kw'

5. DML Operations

Permission settings (org-level; defaults shown): gsqlAllowInsert=true, gsqlAllowUpdate=true, gsqlAllowDelete=false, gsqlAllowInsertEntity=true, gsqlAllowUpdateEntity=true, gsqlAllowDeleteEntity=false. DDL permissions documented in gsql_llm_ddl_reference.html.

5a. INSERT - system.sample (TDQ)

INSERT OR REPLACE INTO system.sample [WITH (options)] (columns) VALUES (...) | SELECT ... OR REPLACE is mandatory. Required columns: (_COMPONENT_UID and _UID) or (_COMPONENT_ID and _ID), plus _TIME (or _TIME_SD/_TIME_ED for intervals), plus _SAMPLE. WITH options for auto-creating components when using IDs (case-insensitive, accept underscore or camelCase, parentheses optional): COMP_TMPL_ID, FOLDER_PATH (must start with /Components), CNAME, DESCRIPTION, TIMEZONE_ID, DERIVE_MAX_STREAMS, DERIVE_TIMEOUT. DERIVE_MAX_STREAMS (default 0, max 20) derives dependent streams synchronously after the insert, but throttled streams are ignored (not derived synchronously) — this includes derived streams that use AI agent functions (agent_*), which are born throttled and derive on the hourly throttled derivation lane. The dependency walk stops at a throttled stream, so streams below it are deferred too. To compute a throttled/agent stream on demand, run an explicit DERIVE STREAM (or the studio Re-derive Stream), which force-derives the chosen stream inline. See Derived Streams → "Derivation Throttling" (help_derived_streams.html#throttling) for the full when/what-derives-throttled rules. If a component or stream doesn't exist when IDs are used, it is auto-created. INSERT OR REPLACE INTO system.sample (_COMPONENT_UID, _UID, _TIME, _SAMPLE) VALUES('a1db...', '2e75...', 1736488800000, 2.0); INSERT OR REPLACE INTO system.sample WITH (COMP_TMPL_ID='accountTemplate') (_COMPONENT_ID, _ID, _TIME, _SAMPLE) VALUES('acct_100', 'balance', 1736488800000, 5000.00); INSERT OR REPLACE INTO system.sample WITH (COMP_TMPL_ID='tmpl1') (...) SELECT ... FROM another_table WHERE ...;

5b. INSERT - Template tables (TEQ)

INSERT INTO template_table (columns) VALUES (...) | SELECT ... NO "OR REPLACE" for template inserts. _component_id is mandatory. WITH clause supports CNAME, FOLDER_PATH, DESCRIPTION, TIMEZONE_ID, LATITUDE, LONGITUDE, ELEVATION. INSERT INTO sensor (_component_id) VALUES ('s1'); INSERT INTO sensor (_component_id, temperature) VALUES ('s2', 42.0); INSERT INTO sensor WITH (LATITUDE=40.7128, LONGITUDE='-74.0060') (_component_id) VALUES ('nyc_sensor'); INSERT INTO sensor (_component_id, temperature) SELECT name, reading FROM csv_table('...');

5c. UPDATE - system.sample (TDQ)

UPDATE system.sample SET _sample = value WHERE _uid = 'stream_uid' AND _time = epoch_ms Modifies a sample at a specific timestamp (INSERT OR REPLACE semantics internally).

5d. UPDATE - Template tables (TEQ)

UPDATE template_table SET column = value [, ...] [WHERE ...] Appends new stream values at NOW for matching components. Does NOT modify history. Only stream columns can be SET (not system columns). WHERE optional — omitting it updates ALL components. UPDATE sensor SET temperature = 100.0 WHERE _component_name = 'Sensor_A'; UPDATE sensor SET status = (SELECT k.status FROM knowledge k WHERE k.category = sensor.category) WHERE sensor.category IN (SELECT category FROM knowledge); UPDATE...FROM is NOT supported. Use correlated subqueries for SET-from-other-table.

5e. DELETE - system.sample (TDQ)

DELETE FROM system.sample WHERE _uid = '...' [AND _time >= start AND _time <= end] Without time conditions, all samples for that stream are deleted. Stream metadata preserved.

5f. DELETE - Template tables (TEQ)

DELETE FROM template_table WHERE ... Cascading delete: removes the component AND all its streams, samples, and index entries.

5g. DDL

All DDL (CREATE/ALTER/DROP for templates, cycles, calendars, views, materialized views, dashboards, catalogs, entity diagrams, connectors, runnables, etc.) is documented in gsql_llm_ddl_reference.html.

6. Sample Temporal Parameters (SELECT result columns ONLY)

CRITICAL RULE: Temporal parameters can ONLY appear on result columns in the SELECT clause. NEVER in WHERE, GROUP BY, ORDER BY, HAVING, or any other clause. The highest-leverage feature in GS SQL. Each stream column has independent time semantics, sample aggregation, gap filling, and window operations — applied directly on sample arrays. Faster than SQL OVER() / WINDOW / GROUP BY date_trunc(...) patterns. Use these for time-series queries. See §7 for composed examples. Forms: TDQ: SELECT _sample(params) FROM system.stream TEQ: SELECT streamId(params) FROM template Range — filters samples by time. Keywords: sd (start, inclusive), ed (end, exclusive), last=N (last N from ed), currentCycle='cycleId', previousCycle='cycleId'. Values: ISO8601 strings, epoch ms, now, streamSd, streamEd, or modifier expressions. Missing sd → streamSd; missing ed → streamEd; if sd uses a modifier, the modifier is applied relative to ed's datetime. Datetime modifiers (case-sensitive): s=second, m=minute, h=hour, d=day, w=week, M=month, Y=year. Capital M for month vs. lowercase m for minute is the easy confusion. CycleId / Stat — aggregates samples into intervals matching the cycle. Stat options (case insensitive): FIRST, LAST, MIN, MAX, AVG, SUM, MINOCCURRENCE, MAXOCCURRENCE, GAPCOUNT, NONGAPCOUNT, INTVLCOUNT, MILLISECCOUNT, NONGAPMILLISECCOUNT, STDDEVP, STDDEVA, SAMPLECOUNT, SUMSIN, SUMCOS. Most are pre-calculated during rollup for fast retrieval. Timezone mismatch → empty rollup — cycleId/stat buckets in the QUERY timezone; if it differs from the stream's component/base-cycle timezone, intervals misalign and the rollup returns nothing (null / empty / no rows / no data / blank) though data exists — raw range without cycleId still returns values. Fix: set query Time Zone to the component's timezone. Coarse cycles (day/week/month) most affected. TimeFilterId — applies a GS Time Filter; filtered intervals stay as gaps when GapFill is also used. GapFill — fills gaps in interval streams. Applied after temporal logic, before WHERE/aggregation. Subkeys: head (start gaps; NEXTVAL or literal), middle (middle gaps; PREVIOUSVAL, NEXTVAL, MIN, MAX, AVG, SPLINE, or literal), tail (end gaps; PREVIOUSVAL or literal). Process order: HEAD, TAIL, MIDDLE. String options quoted (head='NEXTVAL'); numeric values unquoted (head=0). Window parameters — operate directly on sample arrays. Applied after GapFill, before WHERE/aggregation. Positional: lag=N, lead=N, lagdefault=val, leaddefault=val (work with any type). Aggregate: slide=N + slidestat='fn' (sliding), running='fn' (cumulative); supported fns are sum, avg, min, max, first, last, count; require numeric types. Rules: lag/lead are mutually exclusive; slide/running are mutually exclusive; cannot mix positional with aggregate on the same column. Window params respect both the stream's reference time filter and the query-time TimeFilterId — filtered samples are skipped (lag/lead count only non-filtered positions; slide/running exclude filtered samples; filtered positions produce NULL output). Multi-sample merge — multiple _sample() result columns sharing range/cycle merge into one row per bucket per component (org setting gsqlMergeSampleCols; ON by default). When OFF, each _sample() emits a separate tuple set with NULLs in the other sample columns.

7. Temporal Parameters - Composed Examples

A few examples cover most usages. Aggregation (cycleId + stat) and Window operations (lag/lead/slide/running) are mutually exclusive, so they're shown separately. All-in-one aggregation — Range + CycleId + Stat + TimeFilterId + GapFill in one statement: -- TDQ: hourly weekend-afternoon sum for the current year, with gap fill SELECT _sample(Range(currentCycle='year'), cycleId='hour', stat='sum', timeFilterId='weekendAfternoon', GapFill(head='nextval', middle='avg', tail=0)) FROM system.stream WHERE _id='kw' -- TEQ: same idea per component SELECT _component_name, kwh(Range(currentCycle='year'), cycleId='hour', stat='sum', timeFilterId='weekendAfternoon', GapFill(head='nextval', middle='avg', tail=0)) FROM meter Side-by-side stats — multiple sample columns sharing range/cycle merge into one row per bucket: SELECT _component_name, _time, _sample(range(sd=-1d, ed=now), CycleId='hour', Stat='avg') AS hourly_avg, _sample(range(sd=-1d, ed=now), CycleId='hour', Stat='max') AS hourly_peak FROM system.stream WHERE _component_id='gen1' AND _id='kw' Window operations — direct on sample arrays, in place of SQL OVER(): -- Cumulative sum for the current month (TEQ) SELECT _component_name, kwh(running='sum', Range(currentCycle='month')) FROM meter -- 5-sample moving average over the last day (TEQ) SELECT _component_name, kwh(slide=5, slidestat='avg', Range(sd=-1d)) FROM meter -- Delta from previous reading (TDQ) SELECT _sample - _sample(lag=1, Range(last=100)) AS delta FROM system.stream WHERE _component_id='gen1' AND _id='kw' Mixed temporal mode (TEQ) — one column with temporal params, one without; engine LEFT JOINs sampled subquery against lastValue pivot: SELECT _component_name, kwh(range(sd=-1d)), voltage FROM meter -- kwh returns last-day samples (multiple rows); voltage returns lastValue (single)

8. Anti-Patterns (Common LLM Mistakes)

Common errors when generating GS SQL. Each shows the WRONG attempt (PG/ANSI habit, hallucinated syntax) and the CORRECT GS SQL form. Window functions: WRONG: SELECT kwh, ROW_NUMBER() OVER (ORDER BY _time) FROM meter WRONG: SELECT kwh, AVG(kwh) OVER (PARTITION BY customerUid) FROM meter CORRECT: GS SQL has no OVER() clause. Use temporal Window parameters: SELECT kwh(running='sum', range(currentCycle='month')) FROM meter SELECT kwh(lag=1, range(last=100)) FROM meter SELECT kwh(slide=5, slidestat='avg', range(sd=-1d)) FROM meter Temporal parameters in WHERE / GROUP BY / ORDER BY / HAVING: WRONG: SELECT * FROM system.stream WHERE _sample(range(last=5)) > 0 WRONG: SELECT * FROM meter ORDER BY kwh(range(last=10)) CORRECT: Temporal parameters appear ONLY in SELECT result columns. To filter by time, use _time conditions on system.sample or wrap the temporal expression in a subquery. IS DISTINCT FROM: WRONG: SELECT * FROM meter WHERE category IS DISTINCT FROM 'A' CORRECT: GS SQL doesn't support IS DISTINCT FROM. Use: WHERE (category != 'A' OR category IS NULL) NULL vs empty string for missing values: WRONG: SELECT * FROM meter WHERE optional_tag IS NULL CORRECT: Missing optional stream values are '' (empty string), NOT NULL. Use: WHERE optional_tag = '' OR optional_tag IS NULL (cover both, since some columns can also be NULL). Datetime literals: WRONG: SELECT * FROM system.sample WHERE _time > '2026-01-01' WRONG: SELECT * FROM system.sample WHERE _time > TIMESTAMP '2026-01-01' CORRECT: Datetimes are epoch-millisecond Longs. Use toEpochMillis('2026-01-01') for direct comparison, or pass an ISO8601 string to a Range parameter: _sample(range(sd='2026-01-01', ed=now)) EXTRACT(EPOCH FROM ts): WRONG: Assuming EXTRACT(EPOCH FROM ts) returns seconds (PG behavior). CORRECT: In GS, EPOCH returns milliseconds. Divide by 1000 for PG-compatible seconds: EXTRACT(EPOCH FROM ts) / 1000 Transactions / VACUUM / EXTENSION (PG no-ops): WRONG: BEGIN; INSERT...; COMMIT; -- assumes atomic multi-statement transaction WRONG: CREATE EXTENSION pg_trgm; -- assumes extension installs something WRONG: VACUUM ANALYZE meter; -- assumes storage maintenance CORRECT: All of the above parse but are NO-OPS — see §9 for the full list. Each statement is its own atomic unit; functions are built-in; storage is self-managing. GRANT / REVOKE / TRUNCATE (raise loud errors as of 2026-06-05): WRONG: GRANT SELECT ON meter TO alice; -- raises; permissions NOT changed WRONG: REVOKE SELECT ON meter FROM alice; -- raises; permissions NOT changed WRONG: TRUNCATE TABLE meter; -- raises; no rows removed ERROR (GRANT/REVOKE): "GRANT/REVOKE is not supported via SQL — manage permissions via the org RBAC UI. The statement was rejected; permissions were NOT changed." ERROR (TRUNCATE): "TRUNCATE is not supported. Use DELETE FROM ... instead. No rows were removed." CORRECT: For permissions, use the GS RBAC UI (groups + capabilities). For bulk row removal on a template, use DELETE FROM meter [WHERE ...] (cascading delete — removes components and their stream history). These were silent no-ops previously; they now raise because silent (0 rows) caused users to believe grants had landed when they hadn't (security-confusion footgun). ROLLBACK for error handling (raises an error, NOT a no-op): WRONG: INSERT INTO sensor (_component_id, temp) VALUES ('s1', 999); ROLLBACK; -- expects to undo the INSERT ERROR: "ROLLBACK is not supported: transactions are not yet implemented. Prior statements have already committed and cannot be rolled back." CORRECT: GS has no multi-statement transactions, so there is nothing to roll back. The INSERT has already committed. To reverse it, issue an explicit reversal: DELETE FROM sensor WHERE _component_id = 's1'; Do NOT suggest ROLLBACK in error-handling patterns. BEGIN/COMMIT are accepted as no-ops; SAVEPOINT/RELEASE/ROLLBACK TO name are also no-ops — only bare ROLLBACK raises. Adding manual permission filters on top of has_*_privilege: WRONG: SELECT * FROM meter WHERE has_table_privilege(current_user, 'meter', 'SELECT') CORRECT: has_table_privilege, has_column_privilege, has_database_privilege, has_schema_privilege, has_any_column_privilege are PG-shape stubs intended for BI tools' metadata queries. Post-2026-06-05 semantics: org owners get TRUE for all privileges, non-owner members get TRUE only for SELECT and REFERENCES. Per-user RBAC is enforced inside the engine on every access path — adding a privilege check in WHERE adds no security and may incorrectly filter out rows the caller is actually allowed to see. Mixing TEQ and TDQ in one query: WRONG: SELECT m.kwh, s._name FROM meter m JOIN system.stream s ON m._component_uid = s._component_uid CORRECT: Cannot mix template tables with system tables in the same query. Pick one mode. To get stream metadata for components in a template, do two queries. Forgetting underscore prefix on system columns: WRONG: SELECT component_name FROM meter WRONG: SELECT component_uid FROM system.stream CORRECT: All system columns are underscore-prefixed snake_case — see §2. system.stream row counts: WRONG: SELECT _component_name FROM system.stream -- expects one row per component CORRECT: system.stream has one row per STREAM, not per component. Use DISTINCT or filter by _id: SELECT DISTINCT _component_name FROM system.stream Template inserts without _component_id: WRONG: INSERT INTO sensor (temperature) VALUES (50) CORRECT: Template inserts require _component_id: INSERT INTO sensor (_component_id, temperature) VALUES ('s1', 50) Sample inserts without OR REPLACE: WRONG: INSERT INTO system.sample (_component_uid, _uid, _time, _sample) VALUES (...) CORRECT: INSERT OR REPLACE is mandatory on system.sample. Per-user RBAC is automatic — do not add filters for it: Per-user permissions are enforced inside the query engine across every access path (UI, GS SQL, ODBC/JDBC, OData, AI assistant). Unauthorized data is absent from results — not just hidden. Do NOT add WHERE clauses to filter by the calling user's permissions; the engine has already done it.

9. Accepted No-Op Statements (PG/ANSI Compatibility)

These statements parse successfully and return a success result so PG/ANSI client scripts paste verbatim, but have no runtime effect. Do NOT generate them expecting them to do something. Do NOT explain them to users as if they had semantics. If the user asks “did my COMMIT save”, the answer is: it was a no-op — use the listed alternative. Three statements that previously silently no-op'd now raise loud errors as of 2026-06-05: bare ROLLBACK, GRANT / REVOKE, and TRUNCATE (see the highlighted rows below). Statement | Effect | Use instead ---------------------------------------------------+-----------+----------------------------------------- BEGIN / COMMIT | no-op | Each statement is atomic; no multi-stmt txn. ROLLBACK (bare) | ERROR | Raises: "ROLLBACK is not supported: | | transactions are not yet implemented. Prior | | statements have already committed and cannot | | be rolled back." Reverse with explicit | | DELETE / UPDATE instead. SAVEPOINT name / RELEASE / ROLLBACK TO name | no-op | (no nested txn model) SELECT ... FOR UPDATE / FOR SHARE | clause | Lock clause parsed and ignored; query runs. | ignored | CREATE EXTENSION [IF NOT EXISTS] name | no-op | Functions are built-in; no extension loader. DROP EXTENSION name | no-op | GRANT ... TO ... | ERROR | Raises: "GRANT/REVOKE is not supported via | | SQL — manage permissions via the org RBAC | | UI. The statement was rejected; permissions | | were NOT changed." Use the GS RBAC UI. REVOKE ... FROM ... | ERROR | Same error wording as GRANT (see above). VACUUM [table] | no-op | Storage is self-managing. ANALYZE [table] | no-op | Use ANALYZE STATS (GS planner stats). TRUNCATE TABLE name | ERROR | Raises: "TRUNCATE is not supported. Use | | DELETE FROM ... instead. No rows were | | removed." Use DELETE FROM template | | (cascades through component history). COMMENT ON ... | no-op | Use description fields in CREATE TABLE / DDL. LOCK TABLE name [IN ... MODE] | no-op | (no row/table locking surface) CREATE [UNIQUE] INDEX [CONCURRENTLY] | no-op | GS streams auto-index; use INDEXED true [IF NOT EXISTS] name ON [ONLY] tbl (col,...) | | on stream definition for explicit control. [INCLUDE (...)] [WHERE pred] [WITH (...)] | | DROP INDEX [CONCURRENTLY] [IF EXISTS] name [,...] | no-op | [CASCADE | RESTRICT] | | REINDEX [VERBOSE] {INDEX|TABLE|SCHEMA|DATABASE | no-op | (storage is self-managing) |SYSTEM} name | | CREATE SCHEMA [IF NOT EXISTS] name | no-op | GS uses catalogs for namespacing. [AUTHORIZATION role] | | See gsql_llm_ddl_reference.html. DROP SCHEMA [IF EXISTS] name [,...] | no-op | [CASCADE | RESTRICT] | | ALTER SCHEMA name {RENAME TO new | OWNER TO role} | no-op | CREATE ROLE name [WITH ...] | no-op | Use GS RBAC (groups + capabilities). CREATE USER name [WITH ...] | no-op | DROP ROLE [IF EXISTS] name [,...] | no-op | DROP USER [IF EXISTS] name [,...] | no-op | SET SESSION AUTHORIZATION {DEFAULT | rolespec} | no-op | (no role-switching surface) RESET SESSION AUTHORIZATION | no-op | DEALLOCATE [PREPARE] {ALL | name} | no-op | Prepared-stmt handles are per-connection. Notes: - ANALYZE STATS (GS-specific, §4) does refresh planner statistics — that is the real form. Bare ANALYZE is the PG no-op. - For real access control, use the GS capability/group RBAC surface (configured per-user or per-group), not GRANT/REVOKE (which raises). - For bulk row removal on a template, use DELETE FROM template [WHERE ...] (cascading delete semantics — removes components and their stream history). TRUNCATE raises. - Three entries in this section raise rather than parse silently: bare ROLLBACK, GRANT / REVOKE, and TRUNCATE TABLE. Everything else parses silently and returns success. Do not suggest any of the three raising forms as part of a recipe.

10. AI Agent Functions (inline LLM calls)

GS SQL and the derived-stream expression engine (JEP) expose inline AI functions — one LLM call evaluated per row (SQL) or per sample (derivation). Same surface as Databricks ai_query, Snowflake AI_COMPLETE/AI_FILTER/AI_AGG, BigQuery AI.GENERATE_*, built over the GS template-modeled temporal graph. The agent_* names are GS-specific (core PostgreSQL has no AI functions; the pgai extension is archived/read-only as of 2026-05-27, so there is no living PG standard to match). 10a. Return type is encoded in the function NAME. The planner needs the return type at plan time, so there is one function per output type — there is NO generic agent(): Function (SQL) | JEP equivalent | Returns | Use for ----------------------+-----------------+----------+------------------------------------------ agent_string(...) | agentString(...)| STRING | free text, extraction, classification agent_boolean(...) | agentBoolean(...)| BOOLEAN | yes/no; put in WHERE for AI_FILTER pattern agent_double(...) | agentDouble(...)| DOUBLE | scores, numeric estimates agent_long(...) | agentLong(...) | LONG | counts, integer estimates agent_json(...) | agentJson(...) | JSON | structured output (see schema =>, 10d) agent_jsonb(...) | (none) | JSONB | structured output, JSONB-typed 10b. SQL signature — required prompt first, then ONE typed input, then named options (PG-style name => value; all optional, PG defaults = NULL):

agent_<type>(prompt [, input] [, model => ...] [, timeout_ms => ...] [, labels => ...] [, schema => ...] [, agent => ...])

Option | Type | Default | Meaning -------------+--------+----------------+-------------------------------------------------- (prompt) | STRING | (required) | instruction; NULL prompt => NULL result (no call) (input) | any | none | the data value, passed typed (not concatenated) model => | STRING | org default | a PROFILE NAME from the org's Available LLM Profiles | | profile | list / chat dropdown (e.g. 'Anthropic', 'xAI', | | | 'Plate Vision'). Resolved by Name at call time | | | (unknown Name => loud error). Picks WHICH profile. tier => | STRING | Minion | 'Brain' or 'Minion' — the brain/minion slot within | | | the model=> profile (or org default). Picks WHICH | | | tier. Independent of model=>. timeout_ms =>| LONG | 60000 | per-call timeout in ms labels => | STRING | none | comma-separated label set; constrains agent_string | | | output to an enum (classification; see 10c) schema => | STRING | none | GS template id; builds the structured output schema | | | for agent_json/agent_jsonb (see 10d) tools => | BOOL | false | true => answer as a tool-using agent (reads the docs, | | | runs GS SQL, etc.) before returning the typed value; | | | runs under the model=> profile (or org default); | | | gated by org AI admin settings + user perms (see 10f) Multiple data inputs: fold them into the prompt with || (single typed input slot by design; JEP is variadic). Tier default is Minion (fast/cheap) for every entry point, including tool-using calls; single-shot typed calls expose exactly one forced-output tool, which Minions handle well. Raise a call to Brain with tier => 'Brain' when the (possibly agentic) reasoning needs it; pick a specific provider/profile with model => '<Name>'. 10c. Classification = agent_string + labels (no separate classify()). The label set becomes a JSON-schema enum on the forced-output tool, so the model must pick one:

SELECT _component_id,
       agent_string('Classify the sentiment', review, labels => 'positive,negative,neutral') AS sentiment
FROM reviews;

JEP uses a agentLabels('a','b','c') marker arg instead of a named option (JEP has no named args):

agentString('Classify sentiment', review, agentLabels('positive','negative','neutral'))

Multi-label → use agent_json returning an array. 10d. Structured output = template-as-schema. Point schema => at a GS template id; the engine builds the forced-output JSON schema from that template's streams (valueType → JSON type, constraints → min/max, unit/description carried through). No hand-authored JSON. A confidence field is just a template field:

SELECT agent_json('Extract contract fields', notes, schema => 'contract_fields') FROM intake;

10e. AI_FILTER pattern = agent_boolean in WHERE. The optimizer tags agent_* functions high-cost + volatile + non-foldable, and a conjunct-reordering pass runs cheap deterministic predicates FIRST so the model is called only on surviving rows (the funnel). Always pair an agent predicate with cheap predicates and/or LIMIT:

SELECT _component_id FROM reviews
WHERE rating < 3                                   -- cheap, runs first
  AND agent_boolean('Is this an angry complaint?', review)   -- runs only on rating<3 rows
LIMIT 100;

Identical (prompt, input, tier, schema) tuples are memoized to one call per query. Agent functions are never constant-folded or hoisted. 10f. Tools = tools => true. Default is a single forced-output shot with no tools (the output tool is always present regardless). tools => true runs the call as a tool-using agent with the AI Assistant's full catalog (read the GS docs, run GS SQL, object lookups, …), gated by the org AI Assistant admin SQL settings + the executing user's permissions — the same tools and gating as the AI Assistant chat — then coerces the answer back to the declared type. It runs under the model => profile (by Name) or the org default profile. Tool selection is all-or-nothing (no inline per-call tool list). Defaults to the Minion tier; add tier => 'Brain' to upgrade. Legacy write tools (InsertObj, etc.) are NOT enabled — they predate GS DDL; agents use GS SQL DDL instead. (There is no per-call named-agent reference; a function picks a profile by Name, not an org agent — a properly-tracked agent reference is future work in the agent-node feature.) 10g. Aggregates — reduce over a GROUP BY (Snowflake AI_AGG parity), one model call per group: agent_string_agg(instruction, expr) and agent_json_agg(instruction, expr). They accept the same named options as the scalar agent_* functions — model =>, tier =>, timeout_ms =>, labels =>, schema => — applied to the single per-group call, e.g. agent_json_agg('Summarize as JSON', review, model => 'Brain', timeout_ms => 120000). With no model/tier the group call is a Minion call on the org default profile with the 60s default timeout; bump timeout_ms (and/or pick a faster tier) for large groups. SQL-only (derivation/JEP has no GROUP BY). Large groups are capped per call in round-2 core; hierarchical map-reduce for unbounded groups is a follow-up. 10h. Multimodal / FILE inputs. When the input column is ValueType.FILE (a URL/handle to GS-stored bytes), the invoker fetches the bytes through the RBAC-checked GS file path (same path CSV_TABLE/JSON_TABLE use for URL sources; external/non-GS URLs are rejected) and attaches them as an image or PDF document block: agent_boolean('Does this image contain a dog?', photo_file), agent_string('Parse this contract to text', contract_file). Image + PDF supported on Anthropic and Gemini; OpenAI/xAI chat carry images but not PDFs (provider limitation). Caps: image/PDF ~10MB, text ~256KB. 10i. Document → table (no new grammar): compose agent_json with JSON_TABLE:

SELECT * FROM JSON_TABLE(
  agent_json('Extract each line item as an object with sku, qty, price', invoice_file, schema => 'line_item'),
  '$[*]' COLUMNS (sku STRING PATH '$.sku', qty LONG PATH '$.qty', price DOUBLE PATH '$.price'));

10j. Derived streams (JEP). A derived stream whose expression uses any agent* function is born throttled: it is moved onto the throttled derivation lane (its own syslongjobs YARN queue) on creation, so even the first — possibly multi-minute — AI call never blocks the main derivation pipeline for other streams. It STAYS throttled permanently (the auto-unthrottle that normally fires after three fast runs is suppressed for agent streams). The AI call runs once per sample and the result is persisted as data — a far better cost profile than ad-hoc SQL. Each agent-using derived stream requires a Run-As user (the runAsUserRef on the derivation definition; defaults to the creating user; used wherever permissions are required, mainly AI functions); derivation throws ACCESS_DENIED if it is unset. The JEP surface has no named options; every GS SQL agent_* option is passed as a marker argument instead, matching GS SQL one-for-one: agentModel('Anthropic') (profile Name from the org's profile list; default = org default profile), agentTier('Brain') (tier Brain/Minion; default Minion), agentLabels('a','b',…) (classification), agentSchema('tmplId') (agentJson output shape), agentTimeout(ms), and agentTools(1)/agentTools() to opt into the gated tool loop (agentTools(0) = off, the default) — e.g. agentString('Summarize using the GS SQL docs', review, agentTools(1), agentModel('Anthropic'), agentTier('Brain')). A plain string/number argument that is not one of these markers is a data input. 10k. Permissions & billing. AI functions require the AI_ASSISTANT capability and honor the org admin settings that limit agent access — exactly like the chat AI agent. Identity follows api-key Run-As precedence (api_key > schedule.runningUser > session). Tokens are billed via the standard CreditAccumulator path; provenance records the concrete model+vendor that actually ran (so a floating tier alias never silently rewrites history). Default temperature is low (~0.2) for repeatable-ish output; ad-hoc SQL re-runs may still differ, whereas derivation persists the result once.