The bug that costs the most

If you build trading signals or any system where decisions are made on data that arrives over time, this is the bug that costs the most. We've seen it kill backtest performance estimates by entire percentage points and we've seen it nearly sink a quant strategy that looked great until it was deployed.

The bug: your historical dataset contains revisions that were not yet known at the timestamps they appear to belong to. The price feed publishes a value, then publishes a correction six hours later. Your dataset stores only the corrected value. Your backtest, run today, sees the correction at the original timestamp — and your model "predicts" something it could not actually have known.

This is point-in-time data leak, and it is everywhere in financial data. Earnings restatements. Index reconstitutions. Vessel ownership changes that get backdated when a flag transfer is registered late. ENTSO-E electricity demand revisions. Refinery throughput re-estimates. The data you query today is not the data that was knowable on the date you're querying.

The data you query today is not the data that was knowable on the date you're querying. Bitemporal modeling is how we represent that fact.

Two times, not one

Uni-temporal data has one timestamp per row: when the event happened. Bitemporal data has two timestamps:

  • Valid time: when the fact was true in business reality. The price was 87.32 starting at 14:32 UTC on 22 April.
  • Transaction time: when the system recorded the fact. The system received the price at 14:32:03 UTC on 22 April, or — if it's a revision — at 09:14 UTC on 23 April after a correction was issued.

These are not the same thing. Conflating them is what creates point-in-time leak. Modeling them separately is what gives you backtest correctness and audit traceability for free.

VALID TIME (when the fact was true in reality) ─────────────────────────────────────────────► EVENT: price 87.32 ────────────────────────► price 87.41 (revised) from 14:32 from 14:32 (same!) TRANSACTION TIME (when system knew about it) │ ▼ RECORDED AT 14:32:03 ─► row 1: valid 14:32, value 87.32 │ ▼ RECORDED AT 09:14+1d ─► row 2: valid 14:32, value 87.41 Both rows describe the SAME valid time. They differ in transaction time. A backtest run between 14:32:03 and 09:14+1d should see 87.32. A backtest run after 09:14+1d should see 87.41.

The minimum viable bitemporal table

Skip the academic literature for a moment. The pragmatic schema we use as a default in BigQuery and Snowflake looks like this:

-- BigQuery
CREATE TABLE price_observations (
  asset_id      STRING      NOT NULL,
  -- valid time: when the fact was/is true in the world
  valid_from    TIMESTAMP   NOT NULL,
  valid_to      TIMESTAMP,                              -- NULL = current
  -- transaction time: when our system recorded this fact
  recorded_at   TIMESTAMP   NOT NULL,
  superseded_at TIMESTAMP,                              -- NULL = active record
  -- the fact itself
  price         FLOAT64     NOT NULL,
  source        STRING      NOT NULL,
  revision_id   STRING                                  -- optional: trace upstream
)
PARTITION BY DATE(recorded_at)
CLUSTER BY asset_id, valid_from;

The four time columns are the load-bearing piece. Read them as two pairs:

  • (valid_from, valid_to) — the period this fact is true. valid_to is null for the most recent value. When a new fact arrives for the same asset, we close the previous row by setting its valid_to.
  • (recorded_at, superseded_at) — the period our system believed this row. When a revision arrives, we keep the original row but set its superseded_at. The revision becomes a new row with its own recorded_at.

That's it. Everything else flows from these four columns.

The point-in-time query

To get the dataset as it was knowable at a specific moment in transaction time — for example, what your system would have shown at 22 April 14:32:03 — you query like this:

-- "What did our system know at 14:32:03 UTC on 22 April?"
SELECT asset_id, price, source
FROM price_observations
WHERE recorded_at <= '2026-04-22 14:32:03 UTC'
  AND (superseded_at IS NULL
       OR superseded_at > '2026-04-22 14:32:03 UTC')
  AND valid_from <= '2026-04-22 14:32:00 UTC'
  AND (valid_to IS NULL
       OR valid_to > '2026-04-22 14:32:00 UTC');

The query reads cleanly: "give me the rows whose business validity spans the moment I care about, but only as my system understood them at that moment." Two predicates per pair of timestamps. Five lines that distinguish a correct backtest from a fictional one.

Common implementations and their trade-offs

The schema above is one of three patterns we see in the wild. Each has trade-offs.

Pattern A: Two time pairs (the one above)

Pros: explicit, queryable directly with simple SQL, cluster-friendly. Cons: requires application code to maintain the superseded_at column on revisions; double the row count of a uni-temporal version.

Pattern B: Append-only event log

Every observation is a new row. No updates, no deletes. The "current view" is a query that filters to the latest recorded_at per (asset_id, valid_from).

-- Get the current view from append-only log
SELECT *
FROM price_observations
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY asset_id, valid_from
  ORDER BY recorded_at DESC
) = 1;

Pros: no updates, fully audit-friendly, dead simple to ingest. Cons: every query pays the cost of the window function; expensive at scale unless you maintain a current-view materialized layer.

Pattern C: SCD Type 2 with revision tracking

Slowly Changing Dimension Type 2 plus a revision counter. Familiar to anyone from a Kimball background.

Pros: natural in dimensional warehouses. Cons: SCD Type 2 was designed for slowly-changing reference data, not high-frequency observations; the row count gets out of hand for time-series.

Our default

Pattern A for medium-volume bitemporal data with a clear "current value" semantics (vessel master data, contract terms, customer records). Pattern B for high-frequency observations where revisions are rare but possible (price feeds, sensor telemetry). The two patterns can coexist in the same warehouse — one per logical entity.

What goes wrong (and how to spot it)

Mistake 1: Conflating the two times

The most common bug: developers treat recorded_at as if it equals valid_from. For most rows it does — the system records a new fact almost immediately after it becomes true. But for revisions, the gap can be hours or days, and that's exactly when the bug bites.

Detection: count rows where ABS(TIMESTAMP_DIFF(recorded_at, valid_from, MINUTE)) > 5. If that's a non-trivial fraction of your data, you have a revision pattern that requires bitemporal handling.

Mistake 2: Not handling supersession

Teams ingest revisions but forget to close out the original row. Now the table has two "active" facts for the same valid time, and queries non-deterministically return one or the other. Test for this by counting rows where superseded_at IS NULL per (entity_id, valid_from) — should always be at most one.

Mistake 3: Performance pitfalls

Bitemporal queries do four-way time predicates per row. Without thoughtful clustering, this gets expensive at TB scale. We default to:

  • Partition by recorded_at at day granularity (queries usually filter on transaction time first).
  • Cluster by (entity_id, valid_from) so business-time scans are localised.
  • Materialise a "current view" that filters out superseded rows; refresh hourly or via dynamic table.

Mistake 4: Treating bitemporal as a religion

Not every entity in your warehouse needs to be bitemporal. Customer email addresses do not need transaction time tracking; you don't backtest against them. Apply bitemporal where the cost of point-in-time leak is measurable — typically: anything fed into ML models or trading signals, anything referenced by regulators, anything where revisions are part of the data source's normal behaviour.

A real-world example

One of the systems we operate, Straitsmonitor, ingests vessel positions, energy demand data, and commodity prices to compute trading signals around the Strait of Hormuz. Three reasons the dataset has to be bitemporal:

  • AIS vessel observations get refined as additional broadcasts arrive — speed and heading values stabilise after the first reading.
  • Vessel master data (ownership, flag, P&I insurance) changes through reflag and sale events that are recorded retroactively when the registry catches up. A "Greek-owned tanker" yesterday might be a "Marshall-Islands shadow fleet asset" on tomorrow's records — referencing the same valid time.
  • Electricity demand from ENTSO-E publishes provisional values that get corrected within 24 hours; trading signals computed before the correction must reproduce the provisional reading exactly, not the corrected one.

The signal computation engine reads the dataset with recorded_at <= signal_compute_time and never sees future revisions. The backtest reproducibility this gives us is, frankly, the difference between a defensible signal and a research curiosity.

When you don't need bitemporal

The cost of bitemporal modelling is real. Double the row count, more complex queries, more careful application logic. We don't apply it where the cost exceeds the benefit. Specifically:

  • Static reference data: country codes, ISO currency codes, calendar tables. These don't get revised in ways that affect analytics.
  • Append-only event streams that are never revised: clickstream events, sensor readings where late-arriving corrections aren't part of the source's contract.
  • Real-time-only systems: if you never backtest and never audit, the bitemporal apparatus is overhead with no payoff.

The default we apply: ask whether a regulator, a quant researcher, or a future audit would ever need to reproduce what the system knew at a specific moment in time. If the answer is yes, the data is bitemporal from day one. Adding it later, on a table that already has terabytes of uni-temporal history, is one of the more painful migration projects we've seen.

Closing thought

Bitemporal modelling is one of those topics that sounds academic until the first time a backtest you trusted blows up in production. After that, the four-column overhead becomes a small price for sleeping at night.

The hardest part isn't the SQL. It's the discipline to maintain it correctly through ingestion, transformations and consumer queries. That discipline is, in our experience, the single biggest distinguisher between data platforms that hold up to audit and ones that don't.

If your team is building trading signals, regulatory reporting, or any system where point-in-time correctness matters, AK-DataGeek can help with bitemporal architecture and migration. Get in touch.