The Past Is Not the Future: The One Bias Every Backtest Has

I have a private confession.

Every time I finish backtesting a new strategy on this site — cleaning the data, running walk-forward across three windows, sweeping the parameter neighborhood, writing the post in plain language — there's a single sentence I can never write cleanly. It shows up at the end of every draft. It's the one line my methodology cannot fully answer.

"This assumes the future looks enough like the past for the strategy to keep working."

Every article on this site talks about biases. Survivorship, hindsight, cherry-picking, overfitting, p-hacking. We've published post-mortems on strategies that looked great until honest testing killed them. We have a dedicated 3-test stack. All of it helps.

But there's a bias sitting underneath all of them — one that no walk-forward check can catch, no parameter sweep can eliminate, and no amount of data can truly fix.

Every backtest is a bet on regime continuity.

That's the philosopher's old problem in a trading-specific wrapper: "The past few years happened this way. The next few years will probably happen similarly."

We quietly assume that. Every one of us. Every backtest we ever run.

The turkey who trusted data

The Scottish philosopher David Hume framed it 250 years ago:

Just because the sun has risen every morning of your life does not logically guarantee it will rise tomorrow. You're assuming the future will resemble the past. That assumption has worked so far — but it cannot be proven.

More brutally, Nassim Taleb updated the thought experiment:

A turkey is fed every morning for 1,000 days by a kind farmer. Every morning, the turkey's data gets stronger: "the farmer loves me." Its confidence grows. Then comes the day before Thanksgiving.

The turkey had a great backtest. 1,000 out of 1,000 mornings confirmed the thesis. Walk-forward in 3 independent sub-periods? Pass, pass, pass. Parameter robustness? The farmer loved the turkey regardless of breed, age, or coop position.

Then the regime changed.

Here's what keeps me awake

Let me make this concrete for our site.

I tested every bot on /bots against 6-8 years of Bitcoin data. In that window, BTC crashed 77%, recovered 9×, went through a full halving cycle, survived FTX, weathered US regulatory chaos, and got ETF-approved. That's diverse. Six different macro environments. Three distinct volatility regimes.

It's also one narrative: Bitcoin emerging as a scarce asset in a zero-interest-rate world that gradually normalized back to higher rates. Every data point I have comes from that single story.

What if the next eight years are completely different?

What if central banks stop buying gold and treasury yields go to zero for a decade?
What if Bitcoin stops having 70% drawdowns and becomes range-bound like gold did 2013-2019?
What if institutional adoption inverts — what if the ETFs see net outflows for three straight years?
What if some risk vector I can't even imagine today becomes the dominant correlation?

I don't know. Nobody does. That's the entire point: we cannot test on data that doesn't exist yet.

Concrete examples from our own bots

You can see the induction problem showing up in specific bot cards on our site:

The Hedge Hopper rotates between Bitcoin and Gold Miners based on 40-day momentum. Backtest: +5,532% over 5.7 years, 3 of 3 walk-forward windows beat HODL.

The caveat I wrote into its bot card: "The 2020-2024 gold run was historically unusual — central bank buying, geopolitical hedging, AI-era inflation worries. Future cycles may differ."

That's the induction problem showing up in plain sight. The strategy worked because gold and crypto had uncorrelated cycles in this specific macro regime. If the next regime sees both trading as "dollar hedges" at the same time, the rotation has nothing to rotate between.

The Tri-Rotator rotates BTC / ETH / SOL on 30-day momentum. Backtest: +34,929% over 5.7 years.

Caveat: "Only 5.7 years of data, one market cycle. SOL can crash harder than BTC." Same story — one cycle observed, the pattern held, but one cycle is not a distribution.

Halving-Countdown (from our BotLab) buys 6 months before each halving and sells 12 months after. +1,706% over 8 years. Beats HODL handsomely with a better drawdown too.

The pattern has fired 4 times in a row (2012, 2016, 2020, 2024). Four data points. If the halving-as-catalyst relationship is structural, we'd expect it to continue. If it was an artifact of Bitcoin's early adoption phase, we'd expect it to weaken as the asset matures. Genuinely don't know which is true until the 2028 cycle prints.

Our 8-year sample vs. the possible futures we can't test

The shaded band is what our backtests saw. Everything outside it is a regime draw we can only guess at.

How we treat our own sample

This is the visualization I keep coming back to mentally. Our 6-8 year backtest sample is one shaded band in a much longer timeline of possible Bitcoin futures.

Each era looks different — different volatility regime, different dominant narrative, different correlation structure. The bands we've actually observed and tested on are a tiny slice. Everything outside that slice is a regime draw we can't test.

Why walk-forward doesn't fix this

Our methodology page describes walk-forward as "catching regime-overfit strategies." That's true — inside the sample.

What walk-forward does:

Splits historical data into independent sub-periods
Tests the strategy in each sub-period
Rejects strategies that only worked in one historical regime

What walk-forward cannot do:

Predict whether future regimes will resemble historical regimes
Protect against regime shifts that haven't happened yet
Distinguish between "robust across all historical regimes" and "robust across all possible regimes"

A strategy with 3/3 walk-forward is more reliable than one with 1/3. But "more reliable" is not "guaranteed." The future is a draw from a distribution we've only partially observed. Our walk-forward score tells us how well a strategy held up across what already happened. It tells us nothing about the regime we haven't seen yet.

What we do about it (imperfectly)

We cannot solve the induction problem. Nobody can. What we can do — what we try to do — is structure our trading so we don't get caught like Taleb's turkey:

1. Real capital stays conservative. The Watchdog is the only bot on real money. It sits in cash 60% of the time and does roughly one trade per year. We chose the most parameter-robust, most regime-diverse-passing strategy for the real stakes — because the future probably won't look exactly like the past, and we want a strategy robust enough to tolerate that.

2. Live verification is mandatory before promotion. Tactician 2.0, Rotator, Tri-Rotator, Hedge Hopper, and The Contrarian are all paper-traded. Before any of them graduates to real capital, they need 45+ days of live performance that matches the backtest. Live data is our only real sample of "the future as it actually happens."

3. Every failure is published. Our post-mortems document every bot we've retired, with root cause. When a strategy stops working, it's usually because the regime shifted and our historical sample didn't cover the new regime. Publishing these teaches readers what regime-shift failure looks like in practice.

4. Every claim carries an explicit caveat. Every bot card on /bots has a "Past ≠ Future" pill next to its tier badge. Not as legal cover. As methodological honesty. "Beat HODL 3/3 walk-forward" is a strong statement within our sample. It is not a prediction about 2026-2030.

5. We don't sell signals as certainty. This is why we don't run a paid signal service. The economics look tempting; the epistemics don't. Charging for signals would imply a confidence in future performance that the induction problem doesn't allow. Teaching the methodology (so you can build your own bots, spot your own regime shifts, run your own post-mortems) is honest. Selling forecasts is not.

What this means for you

When you read any trading strategy — ours or anyone else's — with a backtest attached, ask three questions:

What regime did the backtest live in? Bull market? Bear? Mixed? How wide was the sample?
Has it been tested live, and for how long? Live performance is drawn from the one regime that matters: the current one.
What would have to change about the world for this strategy to stop working? If the author can't name the conditions, they haven't thought about it.

The first two come from our methodology. The third is the induction check.

The honest bottom line

A good backtest is evidence. It is not proof. Walk-forward is stronger evidence. It is not proof either. Parameter robustness is stronger still. Still not proof.

The only thing that moves from evidence toward something approaching proof is time elapsed trading live in whatever regime the future turns out to be.

That's why we keep the real-capital bot conservative, insist on live verification periods for paper bots, and publish every death certificate when a strategy fails.

The past is not the future. Our job is to make decisions that respect that fact — not to pretend we've solved it.

Further reading: Live ≠ Backtest — the six ways a perfect backtest still loses on real capital. · The 3-Test Stack methodology — what we check before deploying. · Post-mortems — every retired bot with root cause.

The Past Is Not the Future: The One Bias Every Backtest Has

The turkey who trusted data

Here's what keeps me awake

Concrete examples from our own bots

Our 8-year sample vs. the possible futures we can't test

How we treat our own sample

Why walk-forward doesn't fix this

What we do about it (imperfectly)

What this means for you

The honest bottom line

Dominic Tschan

The 23 Biases That Wreck Your Trading

TheBot-Letter

More Articles

Bitget Copy Trading: We Did the Math on the "Top 20 Lead Traders". Here's the Real Return After Survivorship Bias.

Meet The Surfer: We Built a Better Grid Bot — and Found Real Alpha

The Sentry: Two Honest Tricks Instead of an AI Reading Forums