Let me do some math with you.
Your strategy has 5 parameters:
- RSI period
- Entry threshold
- Exit threshold
- Stop-loss distance
- Max holding days
Each parameter you test at 20 possible values.
Total combinations: 20^5 = 3.2 million.
Somewhere in those 3.2 million combinations, there is a version that looks absolutely brilliant on your test data. Not because the strategy works. Because with 3.2 million random draws, at least one is going to fit any noise you feed it.
This is data-snooping. The most sophisticated form of overfitting.
What It Is
Data-snooping: exhaustively searching parameter space until you find the combination that makes the backtest look great, then presenting that combination as if it were discovered, not constructed.
The statistician's name for it is "multiple testing." The trader's name for it is "optimization." They're the same thing.
The Arithmetic That Should Scare You
If you test 1 random strategy, probability it looks significant by chance: ~5 percent.
Test 20: at least one random hit expected.
Test 3.2 million: you will find hundreds of "significant" random hits.
The hit you select out of those 3.2 million will look perfect. Max return, min drawdown, beautiful equity curve. And it will be noise.
Real strategies don't look perfect. They have messy periods. Real backtests show variance. Optimized backtests are too clean.
The Clean-Chart Warning Sign
When you see a backtest equity curve that goes smoothly up and to the right, barely any dips, few bad months — be suspicious.
Real strategies have:
- 10-30 percent drawdowns occasionally
- Multi-month losing streaks
- Entire years underperforming benchmarks
- Messy trade-level variance
If the chart looks like a perfect line, it's been data-snooped. Every bump was tuned out.
My Close Call
Building the Momentum strategy for Sandra, I had multiple knobs:
- Lookback: 3, 6, 9, 12 months
- Skip: 0, 21, 42 days
- Top N: 3, 5, 10, 15, 20, 30
- Market cap floor: $1B, $5B, $10B, $25B, $50B
- Trend filter: yes/no
- Rebalance frequency: weekly, biweekly, monthly, quarterly
That's 4 × 3 × 6 × 5 × 2 × 4 = 2,880 combinations.
If I had tested all 2,880 and picked the best, I'd have been p-hacking / data-snooping. At least one would have shown +1000 percent.
What I did instead:
- Fixed lookback at 12 months (from academic literature)
- Fixed skip at 21 days (from academic literature)
- Fixed rebalance monthly (from academic literature)
- Tested only Top N (5, 10, 20) and MCap floor ($5B, $10B)
11 combinations total. All tested. All published. Not cherry-picked.
The winner: Top 10, $10B floor, 12M lookback, 21 skip, monthly rebalance. That's not an accident. It's theory-driven, with 3 degrees of freedom left for practical optimization.
The Honesty Test
Ask a trader: "How did you choose your parameters?"
Honest answer: "Academic literature" / "Theory" / "Intuition tested on small sample."
Snooping answer: "Tried different values, these worked best."
The snooping answer means: they optimized. Which means: their backtest is shinier than the reality.
The Paper That Should Change Your Mind
Cam Harvey, a Duke finance professor, co-wrote a paper called "...and the Cross-Section of Expected Returns" in 2016. They analyzed 316 "factors" published in academic finance.
Their conclusion: a large share of those factors are likely false. Using a stricter multiple-testing-corrected t-statistic (around 3.0 instead of the usual 2.0), most of the 316 candidate factors fail to clear the bar. They survived publication because thousands of researchers were testing thousands of ideas, and at the conventional 5% threshold, noise slips through.
Think about that. Published, peer-reviewed, citation-heavy academic finance. Majority noise.
Retail trading strategies, without peer review, are probably 99 percent noise.
The Right Way to Use Parameters
1. Set parameters from theory first. Why 12 months for momentum? Because Jegadeesh-Titman showed 3-12 months is the continuation window. Not because you tested 5, 10, 15 and 12 looked best.
2. Limit your knobs. Every parameter is a degree of freedom that invites data-snooping. Under 5 parameters is acceptable. Over 10 is almost always snooped.
3. Sensitivity analysis. Does the strategy work across a RANGE of similar parameters? If +/- 10% on each parameter keeps it working, it's robust. If only one exact combo works, it's snooped.
4. Out-of-sample verification. Design on 70 percent of data. Test on the 30 percent you haven't touched. If it breaks, it was snooped.
5. Publish the grid. If you tested 11 variants, publish all 11 performances. Not just the winner. This way readers can see if the winner was a single spike or a broad plateau.
The Momentum Grid I Published
From my Alpha Hunt article:
| Strategy | Return | CAGR | MaxDD |
|---|---|---|---|
| Top 5, 12M, 10B | +830% | +36% | -66% |
| Top 10, 12M, 10B | +742% | +34% | -53% |
| Top 20, 12M, 5B | +573% | +30% | -52% |
| Top 5, 12M, 5B | +292% | +21% | -61% |
| Top 10, 12M, 5B | +420% | +25% | -63% |
| Top 10, 6M, 5B | +300% | +21% | -72% |
| Top 10, 3M, 5B | -39% | -7% | -90% |
| (+ with costs, + trend filters, etc.) |
Notice: 10 of 11 beat SPY. That's a broad plateau, not a single spike. If I had published ONLY Top 5, 12M, $10B (+830%), I'd be snooping. Publishing the grid shows the strategy is robust to parameter choice. That makes it trustworthy.
The Top 10, 3M, 5B variant lost 39 percent. That's a failure mode — 3-month lookback is too short, catches noise. Publishing that shows the failure case too.
The Mental Shift
Data-snooping feels like "optimization." Make the strategy better.
It's actually "fitting to noise." Make the strategy more fragile.
The optimized version is the fragile version. More parameters = more fit to the past = less fit to the future.
Simpler strategies with fewer knobs generalize better. This is a deep truth in machine learning (Occam's razor, regularization, early stopping) and applies 1:1 to trading.
What to Do
When you see a strategy claim:
- Check parameter count. Under 5 is OK.
- Ask how parameters were chosen. "Theory" is good. "Tested a range" is suspect.
- Demand the full grid, not just the winner.
- Verify out-of-sample performance.
When you build a strategy:
- Set parameters from theory first
- Test a small range around theory values
- Publish the full grid
- Look for robustness, not peak performance
The winner of your grid should look slightly worse than the best point. Because the best point is random noise. The median point is signal.
That's the mindset that separates real edges from data-snooped illusions.
Sources
- Harvey, Liu, Zhu (2016) — "...and the Cross-Section of Expected Returns" — the 316 factors paper
- Lo & MacKinlay (1990) — Data-Snooping Biases — classic reference
- Occam's Razor in ML — the parallel in machine learning
Your Dominic, who tests 11 variants of ONE theory instead of 3.2 million random ideas.
Disclaimer: Not financial advice. Past performance does not guarantee future results.




