ML-Based Trading, Explained Plainly

Every year, thousands of engineers build machine learning models to trade stocks. Most of them lose money. Not because ML doesn’t work in finance — it does, when applied correctly. But because the gap between a promising backtest and a profitable strategy is enormous, and almost nobody talks about what lives in that gap. Since 1926, only 42.1% of US stocks have outperformed risk-free Treasury bills. The median stock lifespan is just 7 years. Marcos López de Prado, the most cited quant researcher alive, wrote that “the hardest problem in finance is not prediction — it’s validation.” This explainer shows you why, using real Nifty 50 data.

Reading the Chart

ML models don’t consume raw stock prices. They consume features — mathematical transformations of price and volume data that encode patterns a model can learn from. The three most important families of technical indicators are RSI, MACD, and Bollinger Bands.

RSI (Relative Strength Index) measures momentum on a 0–100 scale. Above 70 means “overbought” — the stock may have risen too fast. Below 30 means “oversold.” The formula: RSI = 100 − 100/(1 + RS), where RS = average gain / average loss over 14 periods.

MACD (Moving Average Convergence Divergence) captures trend changes. It’s the difference between two exponential moving averages (12-period minus 26-period). When the MACD line crosses above its 9-period signal line, that’s a buy signal. When it crosses below, sell.

Bollinger Bands wrap a 2-standard-deviation envelope around a 20-period moving average. When the bands squeeze tight, volatility is low and a breakout is coming. When price touches the upper band, the stock may be overextended.[1]

John Bollinger invented these bands in the 1980s. The key insight isn’t the bands themselves — it’s the squeeze. Periods of low volatility reliably precede periods of high volatility. ML models exploit this by using Bollinger Band width as a feature.

Technical Indicator Explorer: Nifty 50 (2022–2024)

Toggle between indicators to see how each reads the same Nifty 50 price data differently. RSI catches overbought/oversold extremes, MACD captures trend shifts, Bollinger Bands reveal volatility squeezes.

The Technical Indicator Explorer shows Nifty 50 weekly closing prices from January 2022 through December 2024, overlaid with three different technical indicators. RSI (Relative Strength Index) oscillates between 0 and 100, with the Nifty hitting overbought territory (above 70) during the rally to 22,000 in late 2023 and oversold (below 30) during the 2022 correction. MACD shows trend-following signals with histogram bars and signal line crossovers. Bollinger Bands reveal volatility squeezes preceding major moves. Each indicator generates different buy/sell signals on the same data, which is why ML models combine multiple indicators as features rather than relying on any single one.

II.

Momentum vs Mean Reversion

Two forces govern stock prices, and they pull in opposite directions.

Momentum says winners keep winning. Jegadeesh and Titman showed in 1993 that stocks that performed well over the past 3–12 months tend to continue performing well. This is the most robust anomaly in all of finance — it works across countries, asset classes, and time periods.

Mean reversion says what goes up must come down. Over longer horizons — 1 to 5 years — stocks that have risen sharply tend to underperform, and beaten-down stocks tend to recover. DeBondt and Thaler documented this in 1985.[2]

The coexistence of momentum and mean reversion isn’t a contradiction. Momentum operates on investor underreaction (news takes time to be fully priced in). Mean reversion operates on overreaction (investors eventually push prices too far). Different timescales, different behavioral causes.

The critical insight: the same stock can be a momentum buy and a mean reversion sell at the same time, depending on which timescale you look at. This is why the lookback period you choose for your ML features matters enormously — and why many strategies fail when the market regime shifts.

Time Horizon Explorer: When Signals Flip

Lookback period: 3 months

Drag the slider from 1 month to 24 months. Watch the signal flip from momentum (short lookback) to mean reversion (long lookback) on the same Nifty 50 data.

The Time Horizon Explorer demonstrates how the same Nifty 50 data produces opposite trading signals depending on the lookback period. At short lookbacks (1-3 months), momentum dominates: stocks that rose continue rising, producing positive forward returns and 58-62% win rates. At long lookbacks (12-24 months), mean reversion takes over: stocks that rose sharply tend to underperform, producing signals that flip direction. The crossover point where momentum fades and reversion begins typically occurs around 9-12 months. This is why ML models trained on one timescale fail when market conditions change, and why the lookback period is one of the most critical hyperparameters in any trading model.

III.

The Overfitting Trap

Here’s where most ML traders get destroyed. You build a model. You backtest it on 5 years of data. The Sharpe ratio is 8.2. You feel like a genius.

Then you trade it live. Sharpe ratio: 0.3.

The gap between 8.2 and 0.3 is overfitting. Your model didn’t learn the market — it memorized your training data. Modern computing lets you test billions of parameter combinations. With enough parameters, you can fit any historical pattern. But those patterns are noise, not signal.[3]

López de Prado estimates that testing 100 parameter combinations on 5 years of daily data gives you a ~50% probability of finding a “significant” pattern by pure chance. Test 10,000 combinations and you’re virtually guaranteed to find several. This is why most published trading strategies fail out of sample.

The rule of thumb: the more complex your model, the bigger the gap between training and testing performance. A simple moving-average crossover might have modest in-sample performance, but it degrades gracefully. A 50-parameter neural net might look incredible in-sample, but it falls off a cliff out-of-sample.

López de Prado’s rule: “If you torture the data long enough, it will confess to anything.” The solution is walk-forward validation — train on window 1, test on window 2, retrain on windows 1+2, test on window 3. Never let your model see test data during training.

The Overfitting Cliff

Model complexity: 1 (Simple)

Drag the slider from simple (1) to complex (10). Watch the in-sample Sharpe soar while the out-of-sample Sharpe crashes. The gap is the overfitting penalty.

The Overfitting Cliff visualization demonstrates the most common failure mode in ML trading. As model complexity increases from 1 (simple moving average crossover) to 10 (50-parameter neural network), in-sample Sharpe ratio rises dramatically from 0.8 to 8.2. But out-of-sample Sharpe ratio tells a different story: it rises initially to about 1.1 at complexity level 3, then crashes to 0.3 by complexity level 10. The sweet spot is around complexity 3-4, where out-of-sample performance peaks. Beyond that, additional complexity only fits noise. The overfitting penalty (in-sample minus out-of-sample) grows from 0.1 at low complexity to 7.9 at high complexity, showing how misleading backtest results can be.

IV.

Measuring What Matters

Returns mean nothing without risk context. A strategy that returns 40% but has a 60% max drawdown will get abandoned long before it pays off — because no human can stomach watching their account lose more than half its value.

The Sharpe ratio normalizes return by volatility: (Return − Risk-Free Rate) / Volatility. A Sharpe of 1.0 is good. 2.0 is excellent. The market long-term Sharpe is about 0.4–0.5.

Max drawdown is the worst peak-to-trough decline. The 2020 COVID crash produced a −38% drawdown on Nifty 50 in just 5 weeks. Recovery took 5 months. The “underwater period” — the time spent below the previous peak — is what actually breaks traders psychologically.[4]

The Calmar ratio = Annual Return / Max Drawdown captures this tradeoff directly. A Calmar above 1.0 means your annual return exceeds your worst drawdown. Below 1.0, and you’re in territory where behavioral psychology predicts strategy abandonment.

Drawdown Underwater Chart

Toggle between strategies. The red underwater area shows how long each strategy spent below its peak. Buy & Hold had the deepest drawdown; Momentum had the longest underwater period.

The Drawdown Underwater Chart compares three strategies on Nifty 50 from 2020-2024. Buy and Hold produced the highest total return at 92% but suffered a maximum drawdown of -38% during the COVID crash of March 2020 with a 5-month recovery. The Momentum Strategy delivered 68% total return with a -22% max drawdown but had the longest underwater period of 8 months during the 2022 correction when momentum reversed. Mean Reversion returned 45% with a moderate -18% max drawdown but lower Sharpe ratio of 0.6. Sharpe ratios: Buy and Hold 0.7, Momentum 0.9, Mean Reversion 0.6. The visualization shows that higher returns don't mean better risk-adjusted performance.

What the Model Learns

You train a gradient-boosted model on 20 features. It achieves a decent out-of-sample Sharpe. But which features actually mattered? This is where SHAP values — the same explainability tool used in credit scoring — become essential.

SHAP decomposes each prediction into individual feature contributions. A positive SHAP value pushes toward “Buy,” a negative one toward “Sell.” The surprise: prior day’s return is often the single strongest predictor, more important than any technical indicator. Volume change ranks second. RSI and MACD matter, but less than most traders assume.[5]

This finding is consistent across multiple studies. The most sophisticated features often contribute less than simple lagged returns. This doesn’t mean technical indicators are useless — they help at the margin — but it humbles the engineer who spent weeks engineering the perfect RSI variant.

The deeper insight: SHAP shows you correlation, not causation. A feature with high SHAP importance might be genuinely predictive, or it might be overfitted noise. Feature importance without proper validation is just a fancier way to fool yourself.

Feature Importance Waterfall

Each bar shows one feature’s contribution. Green pushes toward Buy, red toward Sell. Notice how the same features flip direction on bullish vs bearish days.

The Feature Importance Waterfall uses SHAP values to decompose ML trading model predictions. On a bullish day, Prior Return contributes +0.18% (strongest signal), RSI adds +0.08%, MACD Histogram adds +0.06%, Volume Change adds +0.04%, while Volatility subtracts -0.03%. On a bearish day, the same features flip: Prior Return contributes -0.15%, RSI -0.10%, Volume Change -0.07%. The base prediction starts at 0.03% (average daily return). Prior day's return dominates all other features, contributing 2-3x more than any technical indicator. This is consistent across multiple quant studies showing that simple lagged returns are often more predictive than engineered features.

VI.

The Efficient Frontier

Harry Markowitz showed in 1952 that you can get the same return with less risk — or more return with the same risk — by combining assets that don’t move together. The efficient frontier is the set of all portfolios that offer the maximum return for each level of risk.

Individual stocks are scattered below the frontier. Only by combining them can you reach it. The key inputs are each stock’s expected return, its volatility, and the correlations between all pairs. The output is the optimal weight for each stock at each risk level.

The catch: Markowitz optimization is notoriously unstable. Small changes in expected returns produce wildly different optimal weights. This is why risk parity — allocating so each asset contributes equally to portfolio risk — has become the practical alternative for many quant funds.[6]

Bridgewater’s All Weather Fund, one of the most successful hedge funds in history, is built on risk parity. The insight: a traditional 60/40 stock/bond portfolio has ~90% of its risk from stocks. Risk parity equalizes this, producing smoother returns across market regimes.

Efficient Frontier Explorer: 5 Nifty 50 Stocks

Risk tolerance: Balanced

Reliance

TCS

HDFC Bank

Infosys

ITC

Drag the slider from conservative to aggressive. Watch the portfolio weights shift and the dot move along the efficient frontier. Individual stocks sit below the curve — diversification pushes you above them.

The Efficient Frontier Explorer shows 5 major Nifty 50 stocks (Reliance, TCS, HDFC Bank, Infosys, ITC) plotted individually on a risk-return scatter. Reliance has the highest return at 18% with 28% volatility. ITC has the lowest volatility at 19% with 12% return. The efficient frontier curves above all individual stocks, demonstrating Markowitz's insight that diversification produces better risk-adjusted returns than any single stock. At conservative settings, the portfolio is heavily weighted toward ITC and HDFC Bank (lower volatility). At aggressive settings, weights shift toward Reliance and TCS. The optimal Sharpe ratio portfolio sits at the tangent point, typically achieving a 0.7-0.8 Sharpe ratio compared to individual stock Sharpes of 0.4-0.6.

The uncomfortable truth. Most retail ML traders spend 95% of their time on signal generation — finding patterns, engineering features, training models. But the research is clear: portfolio construction and risk management matter more than prediction accuracy. A mediocre signal with great risk management beats a great signal with no risk management, every time.

Based on the work of Marcos López de Prado (Advances in Financial Machine Learning, 2018), Harry Markowitz (Portfolio Selection, 1952), Jegadeesh & Titman (Momentum, 1993), DeBondt & Thaler (Mean Reversion, 1985), and Bessembinder (Stock Returns, 2018). Nifty 50 data from NSE India.