Lacuna v1
Live
Methodology · Version 1.0

How we reason about prediction markets.

A calibrated AI analyst is only as good as the methodology you can audit. Here's ours, in full.

I. The calibration thesis

Most AI benchmarks measure whether a model can do a task. We measure something harder: when a model says "I am 70% confident," is it actually right 70% of the time? This distinction matters in prediction markets because the product of our work is a number, and the only way to check it is to collect thousands of them over time.

II. Sterile date filtering

Our backtest infrastructure enforces sterility at the tool level. When the agent runs a search at simulation time T, our search layer filters results to documents available before T, and runs a second classifier pass to catch backdated content.

III. Multi-model consensus

Every market gets analyzed by multiple model families. Agreement is required before we publish a directional signal. This decorrelates errors between models that don't share a training distribution.

IV. Scoring

Every signal is timestamped, stored in an append-only ledger, and scored on accuracy, Brier score, and calibration curves. All three are visible in the public track record.

V. What we publish, what we refuse

Everything we write is public. Signals can't be edited after publishing; corrections are appended. We don't delete losing calls. We don't claim numbers we can't reproduce from a sterile backtest.