Here's a cute statistical fact that should unsettle you.
If you test 20 random strategies on Bitcoin data, one of them will look statistically significant. Just by chance.
Not because it works. Because 20 random tries at a 5-percent threshold yields one random hit.
That one hit is what the guru sells you.
What P-Hacking Is
P-hacking (from "p-value hacking") is the academic name for: test enough strategies, and by pure luck, some will look like real edge.
The statistics work against you. At the standard 5-percent significance level, 1 in every 20 random patterns will pass the test. Even if all 20 are worthless.
If someone tests 100 strategies and shows you the 5 that passed, you're looking at nothing but noise.
The Retail Version
You see Twitter threads like this:
"I tested 50 trading strategies on Bitcoin. These 3 beat HODL. [chart]"
That's p-hacking made visible.
50 strategies at a 5-percent noise threshold: 2.5 should look significant by pure random chance. Finding 3 is right on the expected noise.
None of those 3 strategies have proven anything. They might be real. They might be statistical ghosts.
The Academic Version — More Sinister
Researchers do this too. Differently.
A finance professor tries 10 hypothesis. Nine fail. One passes p < 0.05. Guess which one becomes the paper?
The published paper looks like a clean discovery. Behind it: 9 invisible failures.
In academic finance, Harvey, Liu and Zhu (2016) documented 316 "factors" that had passed the 5% significance bar in published papers. After correcting for multiple testing, they concluded that a large share of those factors are likely false discoveries — the 1-in-20 random hits, dressed up as science. Marcos López de Prado proved the related False Strategy Theorem, which quantifies the same effect from the strategy-selection side.
This is why so many "proven" factors from 2010 papers stopped working after 2015. They weren't real. They got p-hacked into publication, then disappeared when tested on new data.
My Own Close Call
In the Alpha Hunt for Sandra's portfolio, I tested 11 momentum variants:
- Top 5, 12M lookback, $5B floor
- Top 5, 12M, $10B floor
- Top 10, 12M, $5B floor
- Top 10, 12M, $10B floor
- Top 20, 12M, $5B floor
- Top 10, 6M, $5B floor
- Top 10, 3M, $5B floor
- Top 10, 12M, $5B, trend filter
- Top 5, 12M, $5B, trend filter
- Top 10, 12M, $5B, with 10bps costs
- Top 5, 12M, $5B, with 10bps costs
9 of 11 beat SPY. Looked great.
But wait: 11 tests at a 5-percent threshold means 0.55 random hits expected. 9 is way above that. Could still be real, could still be partial p-hacking.
What saved me from full p-hacking: all 11 were variants of the same academic theory (Jegadeesh-Titman 1993 momentum), not random unrelated strategies. Each test is partially redundant with the others.
That's actually the right defense. Don't test 20 wild ideas and pick the 1 that works. Test 1 idea in 20 ways and look for robustness.
How to Spot P-Hacking in a Claim
Red flag 1: "I tested X strategies and these Y worked." Where X >> Y. That's p-hacking by ratio.
Red flag 2: oddly specific parameters. "RSI(14) threshold 27.3 with exit at RSI 71.5, max hold 17 days." That specificity screams "I tuned to exactly one data sample."
Red flag 3: no out-of-sample test. If the strategy was both designed and validated on the same data, it's untrustworthy.
Red flag 4: new parameters, fresh discoveries. Every quarter there's a "new factor" that beats everything. Most evaporate by the next quarter. That's the p-hack graveyard.
Red flag 5: magic numbers from random tests. If a strategy's logic depends on a specific SMA period "23" and nothing near 23 works, that's a p-hack.
How to Test for It in Your Own Work
1. Control your hypothesis count. Before testing, write down what you're testing and why. Each new test AFTER the first is a degree of freedom. Budget them.
2. Use out-of-sample holdout. Design on 70 percent of the data. Test on the unseen 30 percent. If the strategy breaks on the holdout, it was p-hacked.
3. Test on different markets. If momentum works in US stocks, does it also work in European stocks? Japanese? If yes, it's likely real. If it only works in your test market, it's a p-hack.
4. Be extra skeptical of "new" discoveries. 98 years of academic research on markets. Everything simple has been tried. If you found "the one weird trick" nobody else found, probably it's noise.
The Deeper Lesson
The finance industry runs on p-hacked strategies. Fund managers justify expensive fees with backtests. Those backtests are usually p-hacked, retroactively fit to the data.
This is why most active funds underperform index funds long-term. Their "edge" was a statistical ghost.
Vanguard's John Bogle figured this out in the 1970s. Most "alpha" is illusion. The few real edges are tiny and hard to capture. For most investors, an index fund beats trying to be clever.
Momentum is an exception, documented across 98 years and 50+ markets. Value is another. Quality is a third. These survived decades of replication. That's how you separate real from p-hacked.
What We Do on BearBullRadar
Every strategy in the BotLab has:
- Single, theory-driven hypothesis (not "test 50 things")
- At least 6 years of data
- Multiple timeframes tested
- Transparent parameter count
- Walk-forward validation when possible
When I can't do one of these, I label it "experimental" and keep it in the lab, not deployed.
The two strategies we actually trust enough to deploy with real money — Andromeda (DM+LD filter on BTC) and the new Momentum Top 10 — both pass all five tests.
The others are still learning. Including the Volume Spike bot, which is now labeled "p-hacking example: looked great on 16 months, broke on 6 years."
The Simple Rule
If a strategy "discovered" its edge, be skeptical.
If a strategy applies an edge that was discovered decades ago and replicated across markets, trust it more.
Trend is not your friend. Proven theory is.
-> Previous: Confirmation Bias -> Next: Regime Bias -> Back to pillar
Sources
- Harvey, Liu, Zhu (2016) — "...and the Cross-Section of Expected Returns" — exposed 316 "factors" with massive p-hacking concerns
- López de Prado — The False Strategy Theorem — math of how often false positives look real
- Ioannidis — Why Most Published Research Findings Are False — foundational paper on the broader problem
Your Dominic, who tested 11 variants of ONE theory so I wouldn't accidentally p-hack 11 random ideas.
Disclaimer: Not financial advice. Past performance does not guarantee future results.




