bias-trading

Data-Snooping: 3.2 Million Parameter Combinations. Your "Winner" Is Noise.

Test every combo. Find the best one. Call it a discovery. Statistics says 80 percent of the time, you're lying to yourself.

DT
Dominic Tschan
April 16, 20266 min read
Data-Snooping: 3.2 Million Parameter Combinations. Your "Winner" Is Noise.

Let me do some math with you.

Your strategy has 5 parameters:

  • RSI period
  • Entry threshold
  • Exit threshold
  • Stop-loss distance
  • Max holding days

Each parameter you test at 20 possible values.

Total combinations: 20^5 = 3.2 million.

Somewhere in those 3.2 million combinations, there is a version that looks absolutely brilliant on your test data. Not because the strategy works. Because with 3.2 million random draws, at least one is going to fit any noise you feed it.

This is data-snooping. The most sophisticated form of overfitting.

What It Is

Data-snooping: exhaustively searching parameter space until you find the combination that makes the backtest look great, then presenting that combination as if it were discovered, not constructed.

The statistician's name for it is "multiple testing." The trader's name for it is "optimization." They're the same thing.

The Arithmetic That Should Scare You

If you test 1 random strategy, probability it looks significant by chance: ~5 percent.

Test 20: at least one random hit expected.

Test 3.2 million: you will find hundreds of "significant" random hits.

The hit you select out of those 3.2 million will look perfect. Max return, min drawdown, beautiful equity curve. And it will be noise.

Real strategies don't look perfect. They have messy periods. Real backtests show variance. Optimized backtests are too clean.

The Clean-Chart Warning Sign

When you see a backtest equity curve that goes smoothly up and to the right, barely any dips, few bad months — be suspicious.

Real strategies have:

  • 10-30 percent drawdowns occasionally
  • Multi-month losing streaks
  • Entire years underperforming benchmarks
  • Messy trade-level variance

If the chart looks like a perfect line, it's been data-snooped. Every bump was tuned out.

My Close Call

Building the Momentum strategy for Sandra, I had multiple knobs:

  • Lookback: 3, 6, 9, 12 months
  • Skip: 0, 21, 42 days
  • Top N: 3, 5, 10, 15, 20, 30
  • Market cap floor: $1B, $5B, $10B, $25B, $50B
  • Trend filter: yes/no
  • Rebalance frequency: weekly, biweekly, monthly, quarterly

That's 4 × 3 × 6 × 5 × 2 × 4 = 2,880 combinations.

If I had tested all 2,880 and picked the best, I'd have been p-hacking / data-snooping. At least one would have shown +1000 percent.

What I did instead:

  • Fixed lookback at 12 months (from academic literature)
  • Fixed skip at 21 days (from academic literature)
  • Fixed rebalance monthly (from academic literature)
  • Tested only Top N (5, 10, 20) and MCap floor ($5B, $10B)

11 combinations total. All tested. All published. Not cherry-picked.

The winner: Top 10, $10B floor, 12M lookback, 21 skip, monthly rebalance. That's not an accident. It's theory-driven, with 3 degrees of freedom left for practical optimization.

The Honesty Test

Ask a trader: "How did you choose your parameters?"

Honest answer: "Academic literature" / "Theory" / "Intuition tested on small sample."

Snooping answer: "Tried different values, these worked best."

The snooping answer means: they optimized. Which means: their backtest is shinier than the reality.

The Paper That Should Change Your Mind

Cam Harvey, a Duke finance professor, co-wrote a paper called "...and the Cross-Section of Expected Returns" in 2016. They analyzed 316 "factors" published in academic finance.

Their conclusion: a large share of those factors are likely false. Using a stricter multiple-testing-corrected t-statistic (around 3.0 instead of the usual 2.0), most of the 316 candidate factors fail to clear the bar. They survived publication because thousands of researchers were testing thousands of ideas, and at the conventional 5% threshold, noise slips through.

Think about that. Published, peer-reviewed, citation-heavy academic finance. Majority noise.

Retail trading strategies, without peer review, are probably 99 percent noise.

The Right Way to Use Parameters

1. Set parameters from theory first. Why 12 months for momentum? Because Jegadeesh-Titman showed 3-12 months is the continuation window. Not because you tested 5, 10, 15 and 12 looked best.

2. Limit your knobs. Every parameter is a degree of freedom that invites data-snooping. Under 5 parameters is acceptable. Over 10 is almost always snooped.

3. Sensitivity analysis. Does the strategy work across a RANGE of similar parameters? If +/- 10% on each parameter keeps it working, it's robust. If only one exact combo works, it's snooped.

4. Out-of-sample verification. Design on 70 percent of data. Test on the 30 percent you haven't touched. If it breaks, it was snooped.

5. Publish the grid. If you tested 11 variants, publish all 11 performances. Not just the winner. This way readers can see if the winner was a single spike or a broad plateau.

The Momentum Grid I Published

From my Alpha Hunt article:

StrategyReturnCAGRMaxDD
Top 5, 12M, 10B+830%+36%-66%
Top 10, 12M, 10B+742%+34%-53%
Top 20, 12M, 5B+573%+30%-52%
Top 5, 12M, 5B+292%+21%-61%
Top 10, 12M, 5B+420%+25%-63%
Top 10, 6M, 5B+300%+21%-72%
Top 10, 3M, 5B-39%-7%-90%
(+ with costs, + trend filters, etc.)

Notice: 10 of 11 beat SPY. That's a broad plateau, not a single spike. If I had published ONLY Top 5, 12M, $10B (+830%), I'd be snooping. Publishing the grid shows the strategy is robust to parameter choice. That makes it trustworthy.

The Top 10, 3M, 5B variant lost 39 percent. That's a failure mode — 3-month lookback is too short, catches noise. Publishing that shows the failure case too.

The Mental Shift

Data-snooping feels like "optimization." Make the strategy better.

It's actually "fitting to noise." Make the strategy more fragile.

The optimized version is the fragile version. More parameters = more fit to the past = less fit to the future.

Simpler strategies with fewer knobs generalize better. This is a deep truth in machine learning (Occam's razor, regularization, early stopping) and applies 1:1 to trading.

What to Do

When you see a strategy claim:

  • Check parameter count. Under 5 is OK.
  • Ask how parameters were chosen. "Theory" is good. "Tested a range" is suspect.
  • Demand the full grid, not just the winner.
  • Verify out-of-sample performance.

When you build a strategy:

  • Set parameters from theory first
  • Test a small range around theory values
  • Publish the full grid
  • Look for robustness, not peak performance

The winner of your grid should look slightly worse than the best point. Because the best point is random noise. The median point is signal.

That's the mindset that separates real edges from data-snooped illusions.


-> Previous: Recency Bias -> Back to pillar

Sources

Your Dominic, who tests 11 variants of ONE theory instead of 3.2 million random ideas.


Disclaimer: Not financial advice. Past performance does not guarantee future results.

Disclaimer: This is not financial advice. All backtests are based on historical data and do not guarantee future results. Only invest what you can afford to lose.

Dominic Tschan

Dominic Tschan

MSc Physics, ETH ZurichPhysics teacher · Crypto investor · Bot builder

ETH physicist who tested 200+ trading strategies on 6 years of real market data. Runs 5 tier-labeled bots — 1 on real capital, 3 paper, 1 backtest-only. Here I share everything: results, mistakes, and lessons.

Free

Bot Alerts & Trading Lies

Get notified instantly when the bot buys or sells. Plus: free PDF, weekly myth-busting and bot performance updates.

Bot Signal AlertsFree PDF
No spamUnsubscribe anytimeYour data stays with us