Derived Streams
Derived streams compute their values automatically from other streams across your organization. Where traditional platforms force you to build and maintain external pipeline code, batch jobs, and materialized views, GroveStreams handles time alignment, windowing, rollups, and cross-entity resolution natively.
There are three types of derived streams:
- Expression — Excel-style formulas that reference other streams as variables
- Aggregation — Statistical rollups across many streams (sum, avg, min, max, gap count)
- RSS Feed — Values pulled from internal or external RSS feeds
Expression Derived Streams
An expression derived stream is defined by a formula and a set of variables. Each variable maps to a stream on any component in your organization. The derivation engine evaluates the formula for each time interval (or sample), automatically loading, aligning, and rolling up the variable data.Key benefits:
- No pipeline code to write or maintain
- Automatic time alignment — variables with different cycle sizes are rolled up to match the derived stream's cycle
- Automatic windowing — derivation only processes intervals where dependent data is available
- Cross-component — variables can reference streams on any component in the organization
- Chainable — derived streams can depend on other derived streams, creating computation graphs
Variables
Each variable in an expression maps to a dependent stream. A variable definition includes:| Setting | Description |
| Name | Variable name used in the expression (e.g., n, cost_rate) |
| Stream | The source stream (component.stream format) |
| Offset | Interval offset for lookback. An offset of -1 uses the previous interval. Useful for rolling averages and period-over-period comparisons. |
| Cycle | Optional rollup cycle. If the dependent stream's native cycle differs from the derived stream's cycle, data is rolled up automatically. Leave blank for automatic selection. |
| Cycle Function | Rollup aggregation method: SUM, AVG, MAX, MIN, LAST, FIRST, TWA (time-weighted average), GAP_COUNT |
| Resolution SQL | Optional. A TEQ (Temporal Entity Query) SELECT that dynamically resolves which stream to use at derivation time. The streams referenced as foreign keys must be configured with their FK target template. Enables FK-resolved dependencies. |
| Fill Forward | When enabled, the derivation engine carries this variable's last known value forward to timestamps
where other variables have data but this one does not. See Fill Forward for details.
Default: Enabled for FK-resolved (Resolution SQL) variables. Disabled for direct variables. |
Functions and Operators
Expressions support a full set of mathematical and string operations. See Expression Capabilities for the complete reference, including:- Arithmetic: +, -, *, /, %, ^
- Comparison: ==, !=, <, >, <=, >=
- Boolean: &&, ||, !
- Trig: sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, etc.
- Log/Exp: ln, log, lg, exp, pow, sqrt
- Statistical: avg(x1,x2,...), min(x1,x2,...), max(x1,x2,...), vsum(x1,x2,...), sum(x1,x2,...)
- Rounding: round, rint, floor, ceil
- String: left, right, mid, substr, lower, upper, len, trim, replaceall, replacefirst, fromBase
- Conditional: if(cond, trueval, falseval), isNull(x)
- JSON: isJSON(str, type)
- Time: time(), dtFormatter(epochTime, pattern, timeZoneId)
- Constants: pi, e, NULL
- System Variables: LAST_VALUE, SAMPLE_TIME, SAMPLE_SDTIME, SAMPLE_EDTIME
Expression Examples
Simple conversion (Fahrenheit to Celsius)3-point rolling average (variables:
n offset 0, n1 offset -1, n2 offset -2, all pointing to the same stream with a Second cycle)
Energy cost with time-of-use rate (variables:
kwh energy meter stream cycle=Hour function=SUM, rate rate schedule point stream)
Conditional alert value
String result from numeric inputs
Automatic Time Alignment and Windowing
One of the biggest challenges in traditional data pipelines is aligning data from sources with different sampling rates. GroveStreams handles this automatically.- Each variable can have its own cycle and function. If the dependent stream has a smaller cycle than the derived stream, it is rolled up automatically. For example, a derived hourly stream with a second-cycle dependent will roll up 3,600 seconds into each hour using the specified function (SUM, AVG, MAX, etc.).
- Windowing is automatic. The derivation engine only processes intervals where all dependent data is available. There is no need to manually define window boundaries, trigger conditions, or buffer sizes.
- Mixed stream types coexist naturally. An expression can combine interval streams (regular time series), random (irregular) streams, and point streams (single values) in one formula. Point stream values are treated as constants available for every evaluation. Interval and random stream data is aligned by time.
FK-Resolved Dependencies (Temporal Relationship Resolution)
We're not aware of another platform that handles this declaratively. On the platforms we've evaluated — time-series databases, stream processors, cloud data warehouses, and traditional RDBMS setups — handling relationship changes inside a derivation requires custom pipeline code. GroveStreams handles it as a single SQL expression on the variable definition. See the full FK-Resolved Dependencies guide for complete documentation, configuration examples, and fan-in aggregation.
Instead of pointing a variable directly at a specific stream, you provide a Resolution SQL statement that dynamically resolves which stream to use at derivation time. This enables:- Temporal relationship resolution — When an FK relationship changes over time (a meter moves from Customer A to Customer B), the engine automatically segments the derivation range and uses the correct target for each period.
- Fan-in aggregation — When a SQL resolves to multiple targets (all meters connected to a customer), the engine aggregates them (SUM, AVG, MIN, MAX) into a single value for the expression.
- Multi-hop chains — Traverse chains of any depth (meter → customer → supplier) with automatic segmentation at every level.
If the meter changes customers mid-year, the engine detects the change, splits the range, and derives each segment using the correct customer's rate.
→ Full FK-Resolved Dependencies Guide — includes SQL patterns, fan-in aggregation, fill forward behavior, time filter interaction, and 7 worked examples with variable grids covering energy, finance, pharma, logistics, HR, and smart buildings.
Fill Forward
When a derived stream references variables that report data at different times, there will be timestamps where some variables have values and others do not. By default, a variable with no value at a given timestamp contributes nothing — and the formula cannot produce a result for that timestamp.Fill Forward solves this by carrying each variable's last known value forward to timestamps where it has no data but other variables do. This is essential for scenarios where a slowly-changing value (such as a rate, a threshold, or a configuration setting) must be combined with a frequently-updating measurement.
Example: A cost stream is derived from
reading * rate. The reading stream
reports every 15 minutes. The rate stream changes once a month. Without Fill Forward, cost would only
be calculated at timestamps where both streams happen to have data — effectively only when the rate changes.
With Fill Forward enabled on the rate variable, the last known rate is carried forward to every reading timestamp,
and cost is calculated at every 15-minute interval.
Defaults
- FK-resolved (Resolution SQL) variables: Fill Forward is enabled by default. These variables typically resolve to slowly-changing streams on related components, so carrying forward is almost always desired.
- Direct variables: Fill Forward is disabled by default. Direct variables usually share a similar reporting cadence, so timestamps naturally align.
FILL_FORWARD option.
Fill Forward with Rollup Cycles
Fill Forward works with variables that have a rollup cycle and function (e.g., SUM over an hourly cycle). When the engine needs to seed the fill-forward value, it loads the rolled-up value from the prior interval — not the raw last sample. For example, if a variable uses MAX over an hourly cycle, the fill-forward seed is the MAX value from the previous hour, not the last raw data point.Restrictions
Fill Forward is not available for:- Point stream variables — Point streams have no time dimension, so Fill Forward does not apply.
Interaction with Arrival Mode
Fill Forward works with both arrival modes:- ALL_ARRIVED — Derivation waits until every variable has reported at least once within the derivation range. Fill Forward then fills any gaps between variables within that range.
- ANY_ARRIVED — Derivation runs as soon as any variable reports. Fill Forward carries forward values for variables that have not yet reported new data at the current timestamp.
Derivation Triggers
GroveStreams derivation runs through two paths, depending on the stream configuration:Real-Time Path (Non-FK Dependencies)
When a derived stream's dependencies are all direct (no FK resolution SQL), derivation can run immediately as data arrives:- API feed arrives (HTTP PUT or MQTT publish) with data for a dependent stream
- The derivation engine walks up the dependency tree, deriving parent streams
- This tree walk continues up to 20 levels deep in a single pass
- Beyond 20 levels, or for streams that weren't triggered inline, the background derivation job (runs every 1–3 minutes) picks up remaining work
Background Path (FK Dependencies)
Derived streams with FK-resolved dependencies are handled by the background derivation job. The job periodically resolves each FK dependency's SQL to determine the current source stream, then creates precedent links on the resolved source streams. Once these links exist, data arriving at the source stream automatically increments the derived stream's dirty counter — the same mechanism used by direct dependencies.The FK resolution cycle runs approximately every 10 minutes. When a relationship changes (e.g., a meter is reassigned to a different customer), the next resolution cycle detects the change, removes the old precedent link, creates a new one on the new source stream, and updates the stored resolution on the dependent variable. Subsequent data arrivals at the new source stream then trigger derivation automatically.
What Triggers a Derivation Run
- Dependent data arrives — any append or modification to a dependent stream's data
- Background job — the derivation job runs every 1–3 minutes, scanning for streams with pending work
- Manual re-derive — the "Re-derive Stream" action in the web studio
- Settings change — any change to a stream's derivation settings triggers a full re-derive
- Historical data modification — when a dependent's historical data changes, the derived stream is re-derived from the change point (see Auto-Recalculation)
How Derivation Knows What Range to Calculate
Every stream maintains three internal timestamps:| Timestamp | Description |
| Start | Start datetime of the first interval in the store |
| End | End datetime of the last interval in the store |
| Calculated Up To | The derivation frontier. For non-derived streams, equals End. For derived streams, represents how far derivation has progressed. |
Derivation is an appending process. It only calculates from the derived stream's Calculated Up To date forward. It does not re-derive the entire history on every run. This keeps derivation fast and scalable even for streams with years of data.
ALL_ARRIVE (Default)
Derivation runs when the derived stream's Calculated Up To date is earlier than the earliest dependent's Calculated Up To date. It derives up to that earliest date. This ensures all dependents have data available for the range being calculated.ANY_ARRIVE
Derivation runs when the derived stream's Calculated Up To date is earlier than the latest dependent's Calculated Up To date. Only one dependent needs new data for derivation to proceed. Use with caution — results depend on data arrival order and missing dependents produce NULLs.Closed Intervals
Once a derived interval's end datetime is less than or equal to the stream's Calculated Up To date, that interval enters a "closed" state. It will not be re-derived in normal operation. This avoids expensive dependency tree lookups on every data arrival. To force re-derivation of closed intervals, delete the derived stream's data and the engine will recalculate from the beginning.Auto-Recalculation on Historical Changes
When a dependent stream's historical data is modified (not just appended), the derivation engine detects the change and automatically re-derives from the earliest modification point. The derived stream's data is deleted from that point forward, resetting its Calculated Up To date, and derivation proceeds from there.This works for all direct (non-FK) dependencies across the full dependency tree.
FK dependency note: The background job re-resolves FK dependencies approximately every 10 minutes.
When a relationship changes (e.g., updating the customerUid stream to point to a different customer),
the next resolution cycle detects the change, migrates the precedent link to the new source stream,
and subsequent data arrivals trigger derivation automatically.
Retroactive re-derivation on FK changes is not immediate. The next precedent reconciliation cycle
(within ~10 minutes) walks the full FK history, migrates stale precedent links, and flags the dependent
stream for re-derivation from the earliest affected timestamp. The following derivation cycle then re-derives
the historical range automatically. If you need an immediate refresh, use the Re-derive Stream action
to trigger it without waiting for the reconciliation cycle.
Expression Derivation and Change
Expression derivation variables store references to their dependent streams. What happens to those references when you copy a component, copy a folder, or reconcile a template? The answer depends on whether the variable uses a fixed selection (direct component/stream pointer) or an FK-resolved dependency (Resolution SQL).1. Copying a Single Component — Internal Dependencies
When you copy a component, the system creates a new component with new UIDs for all of its streams. Any expression variable that points to a stream on the same component (an internal dependency) is automatically remapped to the corresponding stream on the new copy. Internal dependencies always survive a copy.2. Copying a Single Component — External Dependencies
Variables that point to streams on other components (external dependencies) behave differently depending on how they are defined:| Variable Type | Behavior After Copy |
| Fixed selection | The copy retains the original component/stream UIDs. Both the original and the copy now derive from the same external source. This may be what you want (e.g., both meters derive from a shared rate schedule), but if you intended each copy to have its own external source, the references must be manually updated. |
| FK-resolved (Resolution SQL) | The copy works correctly with no manual intervention. The Resolution SQL resolves relative to the current component (@_component_uid), so the new copy resolves its own FK relationships at derivation time. If the copy's FK stream (e.g., customerUid) points to a different customer than the original, each component automatically derives from its own correct target. |
This is a key advantage of FK-resolved dependencies. Fixed selections embed specific UIDs that break or become shared after a copy. FK-resolved dependencies use SQL that resolves dynamically, so copies always derive from their own correct targets without manual rewiring.
3. Copying a Folder
When you copy a folder (or subfolder tree), every component and stream inside the folder is duplicated. The system builds a mapping from old UIDs to new UIDs for all copied components and streams.For each expression variable with a fixed selection, the system checks whether both the referenced component and the referenced stream are inside the copy scope (i.e., both appear in the UID mapping):
- If both are in the mapping — the variable is remapped to the new equivalent component/stream inside the copied folder. This works correctly.
- If the variable points to a component/stream outside the folder — it is not remapped. The copy retains the original UIDs, pointing to the external source. As with single-component copies, this may or may not be what you want.
@_component_uid and do not depend on stored UIDs for external targets.
Each copied component's FK streams determine which external entities it resolves to at derivation time.
Summary: Folder copy remaps internal references correctly. External fixed-selection references survive unchanged (pointing to the original external source). FK-resolved references always work because they resolve dynamically at derivation time relative to each component.
4. Template Reconcile
When you modify a component template (add, remove, or change streams), the template's reconcile operation pushes those changes to all linked components. For expression derivation, reconciliation replaces the component's derivation settings with the template's current settings — with important exceptions controlled by reconcile locks.How reconcile processes derivation dependencies:
- The template's current derivation settings (expression, variables, cycles) are applied to the linked component
- For each variable on the template, the system looks for a matching variable on the component
- If a variable has a reconcile lock enabled, the component's current setting for that variable is preserved — the template's value is not applied
- If a variable has Resolution SQL (an FK-resolved dependency), it is automatically skipped during reconciliation — the component's FK resolution is preserved
Reconcile Locks
Reconcile locks prevent the template from overwriting specific settings on a linked component. There are three types of reconcile lock relevant to derivation:
| Lock Type | Applies To | What It Protects |
| Expression Dep Lock | Individual expression variable | Prevents reconcile from changing this variable's component/stream reference. Use this when you've manually pointed a variable at a specific external component and don't want the template to overwrite it. |
| FK Reconcile Lock | Stream (FK target template reference) | Prevents reconcile from changing the stream's FK target template reference. This protects the stream-level FK configuration. |
| Stream Group Lock | Aggregation stream group reference | Prevents reconcile from changing the stream group reference used for aggregation derivation. |
During reconciliation, the system saves all locked settings and external references before applying the template, then restores them after. This ensures locked values survive even a complete template overwrite.
FK-resolved dependencies are automatically safe during reconcile. Variables with Resolution SQL are never overwritten by template reconciliation, regardless of lock settings. This is because FK-resolved variables resolve dynamically at derivation time — the template defines the SQL pattern, but each component resolves its own targets based on its own FK stream data. There is nothing component-specific to overwrite.
Derived Streams by Aggregation
Aggregation derived streams compute statistics across a collection of streams rather than combining a few streams in a formula. This is useful when you need to roll up data from many components of the same type.Use cases:
- Total energy consumption across all meters in a facility
- Average temperature across all sensors in a warehouse zone
- Maximum pressure across all pumps in a system
- Gap count for quality monitoring across all data feeds
- Select the source streams (by component folder, template, or manual selection)
- Choose the aggregation function: SUM, AVG, MIN, MAX, FIRST, LAST, TWA, GAP_COUNT
- Aggregation runs on a schedule (configurable) or can be triggered manually
- Results are stored as a normal stream — fully queryable, chartable, and usable as a dependent in expression derivations
How GroveStreams Compares
The following table compares GroveStreams derived streams with the approaches required on other platforms to achieve equivalent functionality.| Capability | Traditional / Multi-System Approach | GroveStreams |
| Formula Derivation | Custom pipeline code (Flink jobs, Spark, dbt models, stored procedures). Must define windows, triggers, watermarks, late-data handling. | Excel-style formula with automatic windowing and time alignment. No code. |
| Cross-Entity Joins | SQL JOINs with temporal validity ranges (SCD2 tables, temporal JOINs, AS OF clauses). Custom code to maintain history tables and triggers. | FK-resolved variables with automatic temporal segmentation. One SQL expression per variable. |
| Relationship Changes | No platform handles this natively inside a derivation engine. Requires custom code to detect changes, split time ranges, resolve targets per range, derive each range, and stitch results. | Automatic. The engine loads FK stream history, detects change points, segments the derivation range, and combines results. |
| Multi-Hop FK Chains | Multiple temporal JOINs with nested SCD2 lookups. Complexity grows exponentially with chain depth. | Nested subquery SQL. Each hop adds one nesting level. Engine walks the chain recursively. |
| Time Alignment | Manual window definitions, watermark strategies, custom interpolation/aggregation code. | Automatic rollup via cycle and function settings on each variable. |
| Dependency Chains | DAG orchestrators (Airflow, Dagster, Prefect). Must define task dependencies, retries, and scheduling. | Built-in dependency tree walking (up to 20 levels inline). Background job handles the rest. No orchestrator needed. |
| Historical Recalc | Backfill jobs, manual DAG reruns, custom change detection. | Automatic re-derivation from the point of change for direct dependencies. |
| Aggregation | Materialized views, batch roll-up jobs (Spark/Flink), manual scheduling. | Built-in aggregation derivation. Select streams, choose function, set schedule. |
| Setup Complexity | Weeks to months. Infrastructure provisioning, pipeline development, testing, deployment, monitoring. | Minutes. Define variables, write formula, save. Derivation starts automatically. |
Platform-Specific Comparisons
| Platform | Derived Computation | Temporal Relationship Handling |
| InfluxDB | Flux tasks with explicit window/aggregateWindow calls. Must define every window boundary and trigger. | No relationship model. Tags are immutable — a tag change creates a new series. No way to join across tag changes. |
| TimescaleDB | Continuous aggregates (materialized views with refresh policies). SQL-based but requires manual window/GROUP BY definitions. | Standard SQL temporal JOINs. Must build SCD2 tables manually with triggers. No automatic segmentation. |
| Apache Flink | Streaming SQL or DataStream API. Powerful but requires Java/Scala code, cluster management, checkpoint configuration. | Temporal table joins (FOR SYSTEM_TIME AS OF). Single-point lookup only — does not segment a range across relationship changes. |
| KDB+ / q | q expressions with asof joins (aj). High performance but requires learning q language. | As-of joins look up the most recent value at a point in time. No automatic range segmentation when relationships change. |
| Snowflake / BigQuery | Scheduled queries, dbt models. SQL-based but requires explicit orchestration and scheduling. | Standard SQL temporal JOINs. Must build and maintain SCD2 patterns manually. |
| Databricks | Delta Live Tables, Spark Structured Streaming. Powerful but complex infrastructure. | SCD2 with Delta tables. Manual implementation of temporal lookups. |
| Palantir Foundry | Pipeline Builder and Code Repositories with scheduled transforms. Powerful orchestration but requires significant implementation effort. | Ontology links are current-state pointers. No native temporal history on relationships — link changes overwrite the previous state. Tracking relationship history requires custom pipeline code and separate versioning datasets. |
The key differentiator: Every platform listed above can compute derived values. What we're not aware of any of them doing natively is resolving which entity to derive from at each point in time when relationships change. On those platforms, that requires custom pipeline code. In GroveStreams, it's a single SQL expression on the variable definition.
Performance and Scalability
Real-Time Inline Derivation
For direct dependencies (no FK resolution), derivation runs inline with data ingestion. When a feed arrives, the engine walks up the dependency tree, deriving parent streams immediately. This tree walk handles up to 20 dependency levels in a single pass. For most use cases, derived streams update within seconds of source data arriving.Background Derivation Job
A background job runs every 1 to 3 minutes, handling:- Dependency chains deeper than 20 levels
- FK-resolved dependencies (derivation triggered via materialized precedent links)
- FK precedent link resolution (re-resolves FK targets every ~10 minutes to detect relationship changes)
- Scheduled derivation streams — when a derivation schedule fires, it marks target streams as dirty and the background job picks them up for derivation in the next cycle
- Streams that were not triggered by inline ingestion
- Catch-up derivation for large data imports
FK Dependency Performance
FK dependency handling has two phases:- Resolution phase (~every 10 minutes): The background job re-resolves each FK dependency's SQL, updates materialized precedent links, and stores the resolved source stream UIDs. This is lightweight — it only runs SQL and updates a few columns per FK dependency.
- Derivation phase (on data arrival): Once precedent links exist, FK-dependent derivation is triggered automatically by the same dirty-flag mechanism used for direct dependencies. The derivation itself re-resolves FK targets at runtime for correctness, builds temporal segments, and derives each segment. FK streams typically have very few data points — a meter doesn't change customers often — so loading FK history at each hop is fast.
Derivation Throttling and Error Display
GroveStreams automatically monitors derivation health. If a derived stream repeatedly fails or runs slowly, the platform moves it to a separate hourly derivation job to prevent expensive expressions from impacting the regular derivation pipeline.Automatic Throttling
The derivation engine tracks a consecutive issue counter for each stream. An "issue" is either:- A slow run — derivation takes longer than 20 seconds
- A derivation error — the expression throws an exception (division by zero, null reference, bad FK resolution, etc.)
A single successful, fast derivation run resets the counter to zero. So if a stream has 2 slow runs followed by 1 fast run, the counter resets and throttling does not engage.
Automatic Unthrottling
Throttled streams can automatically return to the regular derivation pipeline. When the hourly throttled derivation job runs a throttled stream and it completes in under 20 seconds, a consecutive fast run counter is incremented. After 3 consecutive fast runs, the stream is automatically unthrottled:- The throttle flag is cleared
- Both the slow and fast run counters are reset
- The stream resumes normal derivation on data arrival
Last Derivation Error
When a derivation error occurs, the error message and timestamp are stored directly on the stream. The derivation panel displays a red error box showing:- The error message (truncated to 500 characters if longer)
- The date and time the error occurred
Clearing Throttle
To immediately resume automatic derivation after throttling (without waiting for auto-unthrottle), open the component stream's derivation panel and click Clear Throttle. This:- Resets the consecutive issue counter to zero
- Removes the throttle flag
- Clears the last error
After clearing, save the stream. Automatic derivation resumes on the next data arrival. If the underlying issue has not been fixed (bad expression, missing dependent, etc.), the stream will throttle again after 3 more consecutive issues.
Throttled Stream Visibility
The Linked Components and Stream Status window (accessible from the component template context menu) shows a Throttled column with an orange warning icon for throttled streams. The tooltip indicates that the stream is running hourly via the throttled derivation job. Use the Throttled filter button at the bottom of the window to show only throttled streams.Resetting Throttles Across Template Instances
For templates with many component instances, individual throttle clearing would be impractical. The Reset Instance Throttles button (in the Advanced section of the derivation panel, visible for template streams only) resets throttle state across all instance streams linked to that template stream in a single operation:- Resets the consecutive issue counter to zero on all instances
- Clears the throttle flag on all instances
- Clears the last derivation error on all instances
Troubleshooting
See the Derivation Troubleshooting page for detailed diagnostic procedures including:- Using the Drill Into view to inspect variable values per interval
- Checking Calculated Up To dates across the dependency tree
- Understanding Deps Changed counters
- Debugging NULL results and expression errors
