This morning I found what looked like a real upgrade.
I tested seventeen variations of momentum trading on BTC daily. Different lookback periods, dual-momentum combos, mean-reversion overlays, volatility filters. The winner was so clean I thought I had a publishable result:
Momentum-14d. Same construction as our existing Tactician bot (which uses 30-day momentum), but with a 14-day lookback instead. The numbers on a 5-year window were so good I extended the test to 8 years on Binance daily data.
The 8-year result:
- Total return: +2,053% vs HODL +654%
- Walk-Forward: 3 of 3 windows beat HODL ← the gold standard
- MaxDD: -52% vs HODL -77%
- 1.01 trades per week (versus Tactician's 0.74)
Three out of three windows. Beating HODL by 1,400 percentage points. Lower drawdown. Higher trading frequency. By every standard test, this looked like a legitimate replacement for Tactician — call it "The Sharpshooter."
Six hours later I had retired it before deploying. Here's what happened.
The One Test That Killed It
After every walk-forward result, I run one more check that I learned the hard way: parameter robustness in a neighborhood.
The question: does the lookback parameter (14 days) need to be exact, or do nearby values (12, 13, 15, 16 days) produce similar results?
A real edge has a smooth response surface. If 14 days works, 13 and 15 should also work — maybe slightly less, but in the same neighborhood. If only 14 days works, what you've found is not an edge. You've found one specific number that happened to align with the noise pattern of your historical data.
Here's the parameter sweep I ran:
| Lookback | Full Return | Walk-Forward |
|---|---|---|
| M-12 | +665% | 1/3 |
| M-13 | +437% | 1/3 |
| M-14 | +2,053% | 3/3 |
| M-15 | +768% | 1/3 |
| M-16 | +662% | 1/3 |
M-14 is a single isolated peak. Its neighbors — M-13 and M-15, just one day off — return 1/3 walk-forward and three-to-five-times less profit. The neighbors of M-14 are mediocre. Only M-14 itself is magic.
That's not an edge. That's overfitting wearing the costume of an edge.
In a real edge, you'd see something like:
- M-12: 2/3 WF, +700%
- M-13: 3/3 WF, +900%
- M-14: 3/3 WF, +1,100%
- M-15: 3/3 WF, +1,000%
- M-16: 2/3 WF, +800%
A plateau of working configurations. The parameter you choose is somewhere in the middle of a robust region. Small changes don't break it.
Instead, M-14 is a stalagmite in flat terrain. Move one day in either direction and you fall off a cliff. That's a sign that the M-14 result was generated by the specific noise pattern of the 2018-2026 BTC history happening to align with the 14-day lookback — not by any underlying market structure that 14 days captures particularly well.
Why This Matters Even at 3/3 Walk-Forward
The walk-forward check (does the strategy beat HODL in each independent sub-period?) is a strong filter. Of the 17 momentum variants I tested earlier, only one (M-14) passed 3/3. The others were 0/3 or 1/3. So M-14 looked uniquely robust.
But walk-forward and parameter robustness are complementary tests, not interchangeable ones.
- Walk-forward catches regime overfitting — strategies that worked in one market era and broke in others.
- Parameter robustness catches noise overfitting — strategies whose return depends critically on one specific parameter choice rather than capturing a real signal.
A strategy can pass walk-forward and still be noise-overfit. It can produce real returns in three independent sub-periods if those returns come from accidentally aligning with random patterns in each sub-period — patterns that happen to have similar shape but no underlying causal mechanism.
When the parameter-sensitivity test then shows the result depends critically on one specific number, with no support from neighbors, the most likely explanation is coincidence accumulated over enough opportunities. Test 17 lookbacks across 8 years of data; one of them is going to look great by chance even with no real edge.
This is exactly the data-snooping bias that the Trading Bias series warns about. Test enough configurations, one will pass any given threshold by luck. Walk-forward alone doesn't protect against it.
My Own Memory Rule, Confirmed Again
I have a rule pinned in my project notes that comes from a previous failed bot deployment:
"Prefer strategies whose success is parameter-robust (many nearby configs positive) over those with a single narrow sweet spot, even if the return is lower."
I wrote that rule after S01 — an early backtest that looked like a 78% drawdown was acceptable in exchange for triple-digit returns. Until live deployment showed S01's parameters were over-tuned to a specific historical regime that didn't repeat.
The lesson from S01: any single magic number that beats HODL by an outlier amount, while its neighbors don't, is almost always lottery-winning rather than skill.
M-14 today is the third confirmation of that rule in this project. The pattern keeps repeating:
- A backtest reveals an exciting result
- Walk-forward "passes" (1-3 windows)
- Parameter sweep shows the result is a single peak
- Honest conclusion: don't deploy
If I had only run the continuous backtest, M-14 would be live now. If I had only added the walk-forward check, M-14 would be live. The parameter robustness test is what catches the difference between "real edge" and "configurational lottery."
This is why post-mortems exist on this site. Every test that fails is documented publicly so the methodology stays sharp.
What Happens to The Tactician
Tactician (M-30) survives this analysis. Its full-period numbers are less impressive than M-14:
- Tactician (M-30): +959% over 8 years, 2/3 WF
- M-14: +2,053% over 8 years, 3/3 WF (but knife-edge)
But the Tactician's parameter neighborhood looks different. M-21, M-22, M-23, M-24 cluster around +500-960% returns. The 30-day parameter sits in a noisier neighborhood, but its 2/3 WF score is consistent with a moderately-robust signal rather than a single magic configuration.
That's why Tactician stays as our paper-tier bot, and M-14 doesn't graduate to even paper-tier.
The takeaway isn't "M-14 is bad." It's "M-14 is not distinguishable from luck given the parameter sweep evidence." We default to don't deploy when we can't tell the difference.
What Happens to The Sharpshooter Concept
The Sharpshooter — a faster, higher-frequency BTC momentum bot — was a legitimate research goal. Tactician at 0.74 trades/week is slow. Many traders (rightly) want more decision points per year. Higher frequency could mean better risk-adjusted returns if the strategy class supports it.
The conclusion from this research: on BTC daily data 2018-2026, no momentum lookback in the 7-30 range produces a parameter-robust improvement over Tactician's 30-day variant. Single lookbacks (M-14, M-20) show isolated good results, but the parameter neighborhoods don't support those as edges.
This means one of three things:
- Higher-frequency BTC momentum has no edge worth capturing on daily timeframes (the strategy class is at its natural ceiling at ~30d)
- The edge exists but requires a different timeframe (intraday data, 4h or 1h bars, where I haven't tested yet)
- The edge exists but requires a different signal (not pure price momentum — maybe momentum + volume + funding rate fusion)
Future research can explore (2) and (3). For now, no Sharpshooter deployment. The Sharpshooter joins the post-mortems page as the sixth retired strategy.
The Methodology Lesson, Sharpened
If you take one process improvement from this post-mortem, take this:
For any new strategy, run all three tests in order. Each one filters out a different failure mode.
- Continuous backtest — does the strategy show positive returns over the longest window you have data for? (Filters obviously broken strategies.)
- 3-window walk-forward — does it beat HODL (or your benchmark) in each independent sub-period? (Filters regime-overfit strategies.)
- Parameter-neighborhood sweep — do the neighbors of your chosen parameter show similar results? (Filters noise-overfit strategies.)
Most retail backtests stop at #1. Some sophisticated tools include #2. Almost nobody runs #3. That's why almost every "validated strategy" you see on the internet still fails when actually deployed.
The Sharpshooter would have failed deployment for sure. It looked perfect through tests 1 and 2. Test 3 stopped it. That's the entire point of doing test 3.
Related Reading
-
Methodology page — The 3-test stack and why parameter robustness is non-negotiable
-
Behavioral Biases Pillar — the 12 ways your brain corrupts trading decisions
-
Beat HODL or Don't Bother — the benchmark every strategy must clear
-
Data-Snooping Bias — why testing many configurations produces false winners
-
Overfitting Bias — the broader category that includes parameter-overfitting
-
Scout v2 Post-Mortem — different strategy, same lesson
-
Post-Mortems ledger — the full failure record
Not financial advice. This article documents a research finding for educational purposes. Past performance does not guarantee future results.




