bias-trading

Look-Ahead Bias: When Your AI Knows Tomorrow's Newspaper

I spent $4.64 on 135 Claude analyses for a 7-year backtest. Result: -1.4 percentage points. Here's why.

DT
Dominic Tschan
April 14, 20266 min read
Look-Ahead Bias: When Your AI Knows Tomorrow's Newspaper

"I ran 135 AI analyses on historical stocks. It didn't help."

That's the honest headline of today's article. $4.64 spent on Claude API calls. 135 company evaluations across 8 years. Result: −1.4 percentage points vs plain quantitative screening.

Why? Because Claude already knew how the story ended.

This is look-ahead bias. It's the most subtle of the ten.

The One-Sentence Version

Look-ahead bias is when your backtest uses information that wasn't available at the time of the trade. Even if you didn't mean to.

The Obvious Form

You're testing a strategy. On March 1, 2022, your rule says "buy if earnings beat expectations." But the earnings report wasn't released until March 15.

If your backtest uses March 15 data for a March 1 trade, that's look-ahead. You bought with information you couldn't have had.

Most serious backtest frameworks prevent this. Bybit's API gives you proper timestamps. SimFin includes publish dates.

But there's a sneakier version. And it destroyed my Claude experiment.

The Claude Version

I wanted to add Sandra's qualitative scoring to her backtest. Moat strength. Management quality. Those things she judges by gut.

Claude should be able to help. Feed it financials plus business description, get back a structured moat score. Do this for every stock in every year 2019-2025. Use those scores to re-rank the selection.

On paper, brilliant. In practice, silent poison.

Claude Sonnet 4.5 was trained on data up to early 2025. That means Claude knows:

  • NVIDIA became the AI chip king
  • Netflix survived the 2022 subscriber panic
  • Meta's metaverse was a disaster but their ad business roared back
  • Tesla's competitive moat crumbled against BYD
  • Silicon Valley Bank collapsed

Even when I explicitly told Claude "evaluate this company as of January 2019, using only 2019-era knowledge," it couldn't un-know.

It's like asking a friend who already saw the sports results to bet on the game. They can try to pretend, but the outcome tugs at every judgment.

The Experiment

For each year 2019 through 2026, I ran the same pipeline:

  1. Quantitative screen picks top 20 by fundamentals
  2. Claude analyzes all 20 (moat score 0-100, mgmt score 0-100)
  3. Re-rank using: quant × 0.5 + moat × 0.3 + mgmt × 0.2
  4. Take top 10
  5. Apply Sandra's execution

Over 8 years, that's 135 unique company-year analyses. At 3 cents each via Sonnet 4.5, total cost $4.64.

Results:

ApproachAggregate ReturnMean per Ticker
Quant-only rolling+28.5%+27.7%
+Claude qualitative+27.1%+29.5%

Claude's qualitative layer made the portfolio slightly worse. Mean per stock went up a tiny bit. Aggregate went down. Net effect: zero alpha, small negative due to rotation costs.

I was genuinely surprised.

Why Adding Intelligence Made It Worse

Three possibilities. Probably all three.

1. Look-ahead bias cancels out the value. If Claude's moat scores are partly contaminated by "I know NVIDIA won," then its high moat score for NVIDIA in 2019 is obvious in hindsight. Everyone who bought NVIDIA in 2019 won. Claude's hindsight picks the same winner as plain quant. No alpha.

2. Qualitative judgment is harder than we think. Maybe moat and management really don't add much on top of the quantitative signals. The fundamentals already capture what matters. The narrative is post-hoc.

3. Annual rotation kills compounding. Even if Claude picked slightly better stocks, forcing a sell on the first of each year cuts off the long-run winners. The top 10 list reshuffles. Compounding breaks.

My guess: it's mostly #3 plus #2. Look-ahead probably didn't even help Claude — it just prevented Claude from being obviously worse.

The Broader Lesson

Any backtest that uses modern data to make historical decisions is suspect. This includes:

  • AI models trained past your test date. GPT-4 knows about 2023. Claude Sonnet 4.5 knows about 2024-2025. Any use of these for "historical" research has a look-ahead stain.

  • Academic factor data. The Fama-French factor library publishes slightly revised historical factors each year. If you use the latest version for 2015 data, that version might include minor corrections from 2020 research. Tiny, but real.

  • Dividend-adjusted prices. If a stock split in 2020 and you're pulling "adjusted close" from Yahoo today, those 2015 prices are adjusted using information from 2020 on. For backtest purposes, small. For dividend-heavy strategies, matters.

  • Sentiment tools. Many sentiment scores are recalibrated periodically. The "fear and greed index" you pull today might not match the value someone saw in real-time on the test date.

The Cure

For AI specifically: Don't use modern LLMs for qualitative historical scoring. Either use quantitative signals (which are time-stamped) or accept that qualitative scoring is forward-only.

For data in general: Use point-in-time databases where every value has a "as reported on" timestamp. SimFin does this. CRSP does this. Free data usually doesn't.

For your own work: If you catch yourself "remembering" how a stock did to make a decision about what would have happened — you have look-ahead. You can't fix your memory. The only fix is discipline.

The Rule I Now Follow

For my Andromeda bot and for the Momentum Live bot on bearbullradar.com, rule one: nothing that generates signals has seen post-test data.

  • Prices: adjusted close is fine, slight look-ahead but material only for dividend strategies
  • Fundamentals: strict publish-date cutoff
  • No LLMs in the signal loop
  • No sentiment indices built after the test period
  • No "quality screens" using criteria developed after the test

The result: a honest number. Probably a smaller number. But honest.

What This Cost Me

Remember Sandra's +28.5 percent rolling test? That was my honest baseline. I expected Claude's layer to bump it to +40 or +50 percent. When it instead dropped to +27.1 percent, I assumed I had a bug.

I didn't. I had look-ahead bias — in reverse. Claude's hindsight didn't add value because Sandra's quantitative screen was already capturing it.

Alpha from qualitative judgment that a computer can produce: zero, to the limit of my ability to measure.

Alpha from qualitative judgment that a human with decades of experience can produce: unknown, but not zero.

Sandra still has her edge. Just probably not an automatable one.


-> Previous: Survivorship Bias — how Sandra's 14 picks hid 12 losers -> Next: Hindsight Bias — "it was obvious, wasn't it?" -> Back to pillar

Sources

Your Dominic, who spent 4.64 dollars to learn that AI can't help you trade the past.


Disclaimer: Not financial advice. Past performance does not guarantee future results.

Disclaimer: This is not financial advice. All backtests are based on historical data and do not guarantee future results. Only invest what you can afford to lose.

Dominic Tschan

Dominic Tschan

MSc Physics, ETH ZurichPhysics teacher · Crypto investor · Bot builder

ETH physicist who tested 200+ trading strategies on 6 years of real market data. Runs 5 tier-labeled bots — 1 on real capital, 3 paper, 1 backtest-only. Here I share everything: results, mistakes, and lessons.

Free

Bot Alerts & Trading Lies

Get notified instantly when the bot buys or sells. Plus: free PDF, weekly myth-busting and bot performance updates.

Bot Signal AlertsFree PDF
No spamUnsubscribe anytimeYour data stays with us