Tested & Proven

Backtesting: How to PROPERLY Test Trading Strategies (and Why 90% Do It Wrong)

"My strategy did +340% in the backtest!"

DT
Dominic Tschan
April 3, 202612 min read
Backtesting: How to PROPERLY Test Trading Strategies (and Why 90% Do It Wrong)

"My strategy did +340% in the backtest!"

Some guy on Twitter wrote that to me. Proud as a peacock. TradingView screenshot, green line shooting up. Looks impressive.

I asked him one question: "How many parameters did you optimize?"

He never replied.

Why am I telling you this? Because backtesting is the most powerful tool in trading — and simultaneously the most dangerous. Used correctly, it shows you what works. Used incorrectly, it shows you what you WANT TO HEAR.

I've tested over 200 strategies on 6 years of Bitcoin data. And made every beginner mistake myself along the way. This article is what I should have read BEFORE my first test.

You'll learn:

  • What a backtest really is (and what it isn't)
  • The 7 traps that make almost EVERY backtest worthless
  • How I test today — with a method that prevents self-deception
  • A checklist you can apply to EVERY backtest

Let's go.


What Is a Backtest?

Imagine a time machine.

You get in, travel back to 2019, and say: "Every time Bitcoin crosses above a certain moving average, I buy. Every time it drops below, I sell."

Then you play through the next 6 years. Day by day. Every candle. Every trade.

At the end, you check: did the strategy make money? How much? How often was it right? How deep was the maximum drawdown?

That's a backtest. You simulate a strategy on historical data.

Sounds simple, right?

It is. The problem isn't the test itself. The problem is you.


The 7 Traps — and Why Your Backtest Is Probably Garbage

Trap 1: Look-Ahead Bias — You're Peeking at the Future

The most common trap. And the most insidious.

Look-ahead bias means: your strategy uses information that didn't exist yet at the time of the trade.

An example: you calculate the "Fear & Greed Index" on March 15. But the index isn't published until the end of the day. In your backtest, you buy on the morning of March 15 based on the March 15 value.

You're peeking at the future. Without realizing it.

Even more subtle: Your backtest calculates at the end of day X: "Bitcoin crossed the 100-day moving average — BUY!" And then it books the purchase at the closing price of day X. Sounds fair, right?

The problem: in reality, you only KNOW at the end of day X that the average was crossed. You can buy at the earliest on the morning of day X+1. And by then, the price might already be 2% higher.

So your backtest gives you a free 2% advantage on EVERY trade. Over 20 trades, that's 40% of phantom returns — that never existed in reality.

Remember: If your strategy uses data that didn't exist at the time of the decision, your entire backtest is worthless.

Still with me? It gets worse.

Trap 2: Overfitting — You Found Noise, Not a Strategy

This is the classic. And I fell for it myself.

Overfitting means: you optimize your strategy until it fits the past PERFECTLY. SMA with exactly 97 days, RSI threshold at exactly 32, stop-loss at exactly 4.7%.

The result looks fantastic. +340%! Screenshot! Twitter!

But you didn't find a market effect. You found NOISE. Random patterns in the data that will never occur again.

How do you spot overfitting?

The neighbor test: Swap your parameters slightly. SMA 97 → SMA 95, SMA 100, SMA 105. Do ALL of them work? Good, you probably found something real. Does ONLY 97 work? You found randomness.

I have a rule for my tests: I never test one setting. I test HUNDREDS. For my best strategy, I calculated 384 parameter combinations. 100% of them were profitable. THAT is robust. THAT is real.

A strategy that only works with a single setting is like a key that only fits the lock during a full moon. Forget it.

Trap 3: Survivorship Bias — You Only Test Winners

You test "Buy the Dip" on Bitcoin. Result: profitable.

But did you also test it on Luna? On FTT? On the 15,000 coins that no longer exist?

Survivorship bias means: you only test on assets that survived. Bitcoin, Ethereum, Solana — the WINNERS. Obviously every strategy looks good when you only test on winners.

My countermeasure: I test every strategy on at least 2-3 different assets. Does it only work on Bitcoin? Then the profit might not be the strategy — but simply Bitcoin.

Trap 4: Time Period Bias — Your Window Lies

"My strategy did +200% from January 2023 to March 2024!"

Do you know what happened from January 2023 to March 2024? Bitcoin went from $16,000 to $73,000. EVERYTHING did +200%. Even close-your-eyes-and-buy.

If you only test in a bull market, every buy strategy works. If you only test in a bear market, every short strategy works.

The solution: ALWAYS test over at least one full cycle. Bull market AND bear market. For Bitcoin, that means: at least 4 years, preferably 6.

My strategies must SURVIVE the 2022 bear market. Not just RIDE the 2021 bull market.

Trap 5: Ignoring Costs — The Silent Killer

0.1% fee per trade. Sounds like nothing.

Let's do a test.

You trade 3x per week. That's ~150 trades per year. At 0.1% per buy and 0.1% per sell, that's 0.2% per trade.

The correct calculation (compound effect): 0.998^150 = ~26% of your capital per year. Just fees. Gone. Irrecoverable.

Your strategy makes +25% gross? After fees: ~-1%. You're working for the exchange, not for yourself.

That's why I prefer strategies with few trades. My bot does 3-4 trades per YEAR. Not per week. Fees are practically zero.

Remember: The more you trade, the more certain you are to lose. Fees are the only guaranteed loser in trading.

Trap 6: Ignoring Walk-Forward — You're Fooling Yourself

This is the trap the fewest people know about. And the most important one.

Imagine you test a strategy on data from 2019-2024. You optimize until it fits perfectly. Then you say: "My strategy works on 2019-2024!"

Of course it does. You OPTIMIZED it for that.

That's like taking an exam where you already know the answers. Obviously you pass. But can you do it WITHOUT the answers?

Walk-forward testing solves this problem:

  1. Train on data from 2019-2021 (in-sample)
  2. Test on data from 2022-2024 (out-of-sample)
  3. The strategy must NEVER have seen the test data before

If the strategy only works in training but fails in testing — overfitting. Throw it out.

My method goes further: I run a "continuous walk-forward" analysis with a window that slides through the entire time series. Every window must be individually profitable. Not the average — EVERY SINGLE ONE.

Why? Because I learned early on: one window with +300% and three with -20% averages out to a positive result. But in reality, you lost money three times and got lucky once.

Trap 7: Data Snooping — You Test 100 Ideas and Pick the Best

This is the most subtle trap of all.

You test 100 different strategies on the same data. 99 lose. 1 wins. "Found it!" you shout. "This strategy works!"

No. With 100 attempts, it's statistically almost CERTAIN that one will randomly perform well. That's not a signal. That's noise.

My countermeasure: When I find a promising strategy, I test it on a COMPLETELY DIFFERENT dataset. Different time period. Different asset. If it works there too, it's probably real. If not, it was randomness.


How I Test — The Method That Prevents Self-Deception

After 200 tests and several painful failures, I developed a system. Not rocket science. But it works.

Step 1: Write down the hypothesis BEFOREHAND

Before I write a single line of code, I write down: "I believe that X works because Y."

Why? Because it forces me to think BEFOREHAND. Not invent an explanation afterward for why the randomness makes sense.

Step 2: Broad parameter sweep

I never test ONE setting. I test a grid. SMA from 50 to 200 in steps of 10. RSI threshold from 20 to 40 in steps of 5. Hundreds of combinations.

Then I look: are MANY of them profitable? Or just a handful?

Step 3: Walk-forward with 3-window check

I divide the data into at least 3-4 windows. Every window must be INDIVIDUALLY profitable. Not the average. Every. Single. One.

Does even ONE window fail? Out.

Step 4: Stress test

What happens in the worst case? Maximum drawdown. How long are you in the red? How deep? Can you handle that emotionally?

A strategy with +50% per year but -80% drawdown is USELESS — you'll sell at -40% out of panic. Guaranteed.

Step 5: Paper trading

Before real money flows, the strategy runs 1-3 months as an observer. Do the live signals match the backtest? Are there execution issues?

Only then: real money. Small amount. Then scale up.


The Backtest Checklist

Print this out. Pin it above your screen. Before you trust a backtest — check EVERY point:

1. No look-ahead? — Does the strategy only use data that was available at the time of the trade?

2. Parameter-robust? — Does it work with MANY settings, not just one?

3. Walk-forward passed? — Does it work on data it's never seen?

4. Full cycle? — Tested over at least one bull AND bear market?

5. Costs included? — Fees, slippage, spread accounted for?

6. Multiple assets? — Does it work on more than JUST the one asset that survived?

7. Drawdown acceptable? — Can you emotionally handle the maximum loss?

If even one point is a "No": hands off. No matter how good the returns look.


The Reality: 30 Strategies, 20 Failed

Want to know what this looks like in practice? Here are real results from one of my test series. 30 strategies. Same data. Same time period. Same rules.

#StrategyWin%ReturnMax DDTradesRobust?Verdict
AMulti-Signal Trend Filter71.4%+19.7%1.1%21✅ 100%✅ STRONG
BMulti-Asset Breakout69.2%+16.5%2.9%13✅ 89%✅ STRONG
CBreakout + Adaptive Exits69.6%+15.1%1.2%23✅ 92%✅ STRONG
DTrend-Breakout + Momentum68.2%+14.6%0.8%22✅ 85%🟡 Promising
EMonthly Open Reclaim72.0%+13.7%1.5%25🟡 62%🟡 Promising
FMonthly Low Bounce68.4%+8.1%1.8%38🟡 58%🟡 Promising
GBollinger Squeeze42.5%+2.2%5.9%47❌ 21%❌ Rejected
HWeekly Hammer42.5%+1.4%3.2%40❌ 18%❌ Rejected
IFunding Rate Flip48.6%-0.3%6.3%74❌ 12%❌ Rejected
J3-Day-Red Bounce30.8%-4.3%5.4%13❌ 8%❌ Rejected
KFunding Streak39.3%-7.1%9.7%61❌ 5%❌ Rejected
LRSI Extreme + Trend27.6%-15.6%15.6%58❌ 3%❌ Rejected
MRegime-Adaptive Complex33.0%-15.5%18.0%112❌ 2%❌ Rejected

Table shortened — 30 strategies tested in total, 20 of them rejected. "Robust?" = percentage of parameter variants that are profitable.

See the Robust? column? That's the key. Strategy A has 100% parameter robustness — EVERY variant is profitable. Strategy M has 2% — only one tiny setting works. Guess which one survives in practice.

In the interactive chart below, you can see all 13 strategies sorted by return. The color shows robustness. Hover over the bars for details:

The winners (green) have few trades, low drawdown, and win rates above 65%. The losers (red) have many trades, high drawdown, and often below 45% win rate.

And the most popular YouTube strategies — RSI, Bollinger Squeeze, 3-Day-Red — are ALL in the bottom third. None of them managed more than +2%.

That's the difference between "I saw it on YouTube" and "I tested it on 6 years of data."


What I Learned from This

My first strategy looked fantastic in the backtest. +180%. I was thrilled.

Then I ran the walk-forward test. Of 4 time windows, 3 were negative. The overall average was only positive because ONE window happened to catch the 2021 bull market.

The strategy was garbage. But without walk-forward, I would have lost real money on it.

Remember: A good backtest protects you from yourself. It's not proof that something works — it's a filter that catches the garbage.

The 5 strategies that survived my process? They all had something in common: they were boring. Few trades. Broad parameter robustness. Positive results in EVERY time window.

None of them looked impressive on Twitter.

But they work.


If you want to know which strategies survived my backtest process — and which ones crashed and burned — start here:

200 Strategies Tested, 7 Mistakes Made — My personal lessons

HODL Beats 27 Out of 30 Strategies — The sobering result

The Most Boring Strategy That Works — 3x better than HODL in backtesting

Trading Calculator — Is your trading worth it? Do the math

Our 3 Bots — The strategies that survived the backtest process

Write me if you have questions. I read every message.

Your Dominic, the guy who tested 200 strategies so you don't have to.


Disclaimer: This is not investment advice. Backtests are based on historical data and do not guarantee future results. Only invest what you can afford to lose.

Disclaimer: This is not financial advice. All backtests are based on historical data and do not guarantee future results. Only invest what you can afford to lose.

Dominic Tschan

Dominic Tschan

MSc Physics, ETH ZurichPhysics teacher · Crypto investor · Bot builder

ETH physicist who tested 200+ trading strategies on 6 years of real market data. Runs 5 tier-labeled bots — 1 on real capital, 3 paper, 1 backtest-only. Here I share everything: results, mistakes, and lessons.

Free

Bot Alerts & Trading Lies

Get notified instantly when the bot buys or sells. Plus: free PDF, weekly myth-busting and bot performance updates.

Bot Signal AlertsFree PDF
No spamUnsubscribe anytimeYour data stays with us