← Danish Khan
SR 0.3

ML-Based Trading, Explained

What actually works, what doesn’t, and why most quantitative strategies fail.

Every year, thousands of engineers build machine learning models to trade stocks. Most of them lose money. Not because ML doesn’t work in finance — it does, when applied correctly. But because the gap between a promising backtest and a profitable strategy is enormous, and almost nobody talks about what lives in that gap. Since 1926, only 42.1% of US stocks have outperformed risk-free Treasury bills. The median stock lifespan is just 7 years. Marcos López de Prado, the most cited quant researcher alive, wrote that “the hardest problem in finance is not prediction — it’s validation.” This explainer shows you why, using real Nifty 50 data.

I.

Reading the Chart

ML models don’t consume raw stock prices. They consume features — mathematical transformations of price and volume data that encode patterns a model can learn from. The three most important families of technical indicators are RSI, MACD, and Bollinger Bands.

RSI (Relative Strength Index) measures momentum on a 0–100 scale. Above 70 means “overbought” — the stock may have risen too fast. Below 30 means “oversold.” The formula: RSI = 100 − 100/(1 + RS), where RS = average gain / average loss over 14 periods.

MACD (Moving Average Convergence Divergence) captures trend changes. It’s the difference between two exponential moving averages (12-period minus 26-period). When the MACD line crosses above its 9-period signal line, that’s a buy signal. When it crosses below, sell.

Bollinger Bands wrap a 2-standard-deviation envelope around a 20-period moving average. When the bands squeeze tight, volatility is low and a breakout is coming. When price touches the upper band, the stock may be overextended.[1]

John Bollinger invented these bands in the 1980s. The key insight isn’t the bands themselves — it’s the squeeze. Periods of low volatility reliably precede periods of high volatility. ML models exploit this by using Bollinger Band width as a feature.

Technical Indicator Explorer: Nifty 50 (2022–2024)

Toggle between indicators to see how each reads the same Nifty 50 price data differently. RSI catches overbought/oversold extremes, MACD captures trend shifts, Bollinger Bands reveal volatility squeezes.
II.

Momentum vs Mean Reversion

Two forces govern stock prices, and they pull in opposite directions.

Momentum says winners keep winning. Jegadeesh and Titman showed in 1993 that stocks that performed well over the past 3–12 months tend to continue performing well. This is the most robust anomaly in all of finance — it works across countries, asset classes, and time periods.

Mean reversion says what goes up must come down. Over longer horizons — 1 to 5 years — stocks that have risen sharply tend to underperform, and beaten-down stocks tend to recover. DeBondt and Thaler documented this in 1985.[2]

The coexistence of momentum and mean reversion isn’t a contradiction. Momentum operates on investor underreaction (news takes time to be fully priced in). Mean reversion operates on overreaction (investors eventually push prices too far). Different timescales, different behavioral causes.

The critical insight: the same stock can be a momentum buy and a mean reversion sell at the same time, depending on which timescale you look at. This is why the lookback period you choose for your ML features matters enormously — and why many strategies fail when the market regime shifts.

Time Horizon Explorer: When Signals Flip

3 months
Drag the slider from 1 month to 24 months. Watch the signal flip from momentum (short lookback) to mean reversion (long lookback) on the same Nifty 50 data.
III.

The Overfitting Trap

Here’s where most ML traders get destroyed. You build a model. You backtest it on 5 years of data. The Sharpe ratio is 8.2. You feel like a genius.

Then you trade it live. Sharpe ratio: 0.3.

The gap between 8.2 and 0.3 is overfitting. Your model didn’t learn the market — it memorized your training data. Modern computing lets you test billions of parameter combinations. With enough parameters, you can fit any historical pattern. But those patterns are noise, not signal.[3]

López de Prado estimates that testing 100 parameter combinations on 5 years of daily data gives you a ~50% probability of finding a “significant” pattern by pure chance. Test 10,000 combinations and you’re virtually guaranteed to find several. This is why most published trading strategies fail out of sample.

The rule of thumb: the more complex your model, the bigger the gap between training and testing performance. A simple moving-average crossover might have modest in-sample performance, but it degrades gracefully. A 50-parameter neural net might look incredible in-sample, but it falls off a cliff out-of-sample.

López de Prado’s rule: “If you torture the data long enough, it will confess to anything.” The solution is walk-forward validation — train on window 1, test on window 2, retrain on windows 1+2, test on window 3. Never let your model see test data during training.

The Overfitting Cliff

1 (Simple)
Drag the slider from simple (1) to complex (10). Watch the in-sample Sharpe soar while the out-of-sample Sharpe crashes. The gap is the overfitting penalty.
IV.

Measuring What Matters

Returns mean nothing without risk context. A strategy that returns 40% but has a 60% max drawdown will get abandoned long before it pays off — because no human can stomach watching their account lose more than half its value.

The Sharpe ratio normalizes return by volatility: (Return − Risk-Free Rate) / Volatility. A Sharpe of 1.0 is good. 2.0 is excellent. The market long-term Sharpe is about 0.4–0.5.

Max drawdown is the worst peak-to-trough decline. The 2020 COVID crash produced a −38% drawdown on Nifty 50 in just 5 weeks. Recovery took 5 months. The “underwater period” — the time spent below the previous peak — is what actually breaks traders psychologically.[4]

The Calmar ratio = Annual Return / Max Drawdown captures this tradeoff directly. A Calmar above 1.0 means your annual return exceeds your worst drawdown. Below 1.0, and you’re in territory where behavioral psychology predicts strategy abandonment.

Drawdown Underwater Chart

Toggle between strategies. The red underwater area shows how long each strategy spent below its peak. Buy & Hold had the deepest drawdown; Momentum had the longest underwater period.
V.

What the Model Learns

You train a gradient-boosted model on 20 features. It achieves a decent out-of-sample Sharpe. But which features actually mattered? This is where SHAP values — the same explainability tool used in credit scoring — become essential.

SHAP decomposes each prediction into individual feature contributions. A positive SHAP value pushes toward “Buy,” a negative one toward “Sell.” The surprise: prior day’s return is often the single strongest predictor, more important than any technical indicator. Volume change ranks second. RSI and MACD matter, but less than most traders assume.[5]

This finding is consistent across multiple studies. The most sophisticated features often contribute less than simple lagged returns. This doesn’t mean technical indicators are useless — they help at the margin — but it humbles the engineer who spent weeks engineering the perfect RSI variant.

The deeper insight: SHAP shows you correlation, not causation. A feature with high SHAP importance might be genuinely predictive, or it might be overfitted noise. Feature importance without proper validation is just a fancier way to fool yourself.

Feature Importance Waterfall

Each bar shows one feature’s contribution. Green pushes toward Buy, red toward Sell. Notice how the same features flip direction on bullish vs bearish days.
VI.

The Efficient Frontier

Harry Markowitz showed in 1952 that you can get the same return with less risk — or more return with the same risk — by combining assets that don’t move together. The efficient frontier is the set of all portfolios that offer the maximum return for each level of risk.

Individual stocks are scattered below the frontier. Only by combining them can you reach it. The key inputs are each stock’s expected return, its volatility, and the correlations between all pairs. The output is the optimal weight for each stock at each risk level.

The catch: Markowitz optimization is notoriously unstable. Small changes in expected returns produce wildly different optimal weights. This is why risk parity — allocating so each asset contributes equally to portfolio risk — has become the practical alternative for many quant funds.[6]

Bridgewater’s All Weather Fund, one of the most successful hedge funds in history, is built on risk parity. The insight: a traditional 60/40 stock/bond portfolio has ~90% of its risk from stocks. Risk parity equalizes this, producing smoother returns across market regimes.

Efficient Frontier Explorer: 5 Nifty 50 Stocks

Balanced
Reliance
TCS
HDFC Bank
Infosys
ITC
Drag the slider from conservative to aggressive. Watch the portfolio weights shift and the dot move along the efficient frontier. Individual stocks sit below the curve — diversification pushes you above them.

The uncomfortable truth. Most retail ML traders spend 95% of their time on signal generation — finding patterns, engineering features, training models. But the research is clear: portfolio construction and risk management matter more than prediction accuracy. A mediocre signal with great risk management beats a great signal with no risk management, every time.

Based on the work of Marcos López de Prado (Advances in Financial Machine Learning, 2018), Harry Markowitz (Portfolio Selection, 1952), Jegadeesh & Titman (Momentum, 1993), DeBondt & Thaler (Mean Reversion, 1985), and Bessembinder (Stock Returns, 2018). Nifty 50 data from NSE India.