- Published on
Financial Labels (Trend Scanning)
- Authors

- Name
- Tails Azimuth
Table of Contents
Financial Labels (Trend Scanning)
This chapter explains why labeling is a critical step in supervised machine learning for finance. The way labels (the variable) are defined determines the exact task the algorithm will learn. Poor labeling can lead an ML model to fail, even if the features () are predictive. The chapter discusses four common labeling strategies.
Fixed-Horizon Method
This is the most common method in academic literature, but it is highly flawed. It assigns a label based on a price return () crossing a fixed threshold () after a fixed time horizon ().
- Labeling Equation:
- Key Flaws:
- Heteroscedasticity: When used with time bars, the fixed threshold ignores the fact that volatility changes (e.g., higher at the open). This creates a non-stationary (unreliable) label distribution.
- No Path-Dependency: It ignores how the price got to the end point. A real strategy might have been stopped out long before the horizon was reached.
- Impractical: Investors rarely care about a price at a specific future time, but rather if a move will happen within a certain window.
Triple-Barrier Method
This is a more realistic method that simulates an actual trading strategy. A label is assigned based on the first of three barriers to be touched:
- Upper Barrier: A profit-taking target (Label 1).
- Lower Barrier: A stop-loss limit (Label -1).
- Vertical Barrier: A maximum holding period (e.g., bars). If this is hit first, the label is either 0 or the sign of the return at that time.
- Advantages:
- It is path-dependent, accurately reflecting that a position can be stopped out.
- It directly models the P&L of a trade, making the labels relevant to a real-world investment.
Trend-Scanning Method
This is a novel method that avoids setting arbitrary barriers. Instead, it labels every point based on the most statistically significant trend that starts after it.
- For an observation , it runs multiple linear regressions over different future look-forward periods (e.g., bars, bars, ...).
- It calculates the t-value for the trend coefficient () for each look-forward period .
- The strongest trend (the one with the maximum absolute t-value,
max(|tVal|)) is selected. - The observation is labeled with the sign of that trend (e.g.,
bin = sgn(tVal)).
- Advantages:
- It lets the data define the trend's duration, rather than a fixed .
- It produces both a binary label (for classification) and a t-value (for regression or weighting), which captures the strength of the trend.
Meta-Labeling
This technique is used to determine bet size and reduce false positives, not to determine the side of the trade.
- Process:
- A Primary Model (which can be any model, e.g., trend-scanning) determines the side (buy/sell).
- A Secondary Model (the "meta-model") is trained only on the primary model's predictions. Its goal is to predict the probability of success.
- True Positives (wins) are labeled 1.
- False Positives (losses) are labeled 0.
The output of the meta-model is a probability that the primary model's signal is correct. This can then be used to size the bet.
- Bet Sizing from Probability: The probability is converted into a z-score (or t-statistic for an ensemble) to measure its confidence relative to a random guess (). This z-score is then mapped to a bet size .
- Sharpe Ratio of a Bet:
- Bet Size: , where is the Gaussian CDF.
- Sharpe Ratio of a Bet:
API reference
RiskLabAI implements these in Python and Julia (signatures auto-generated from the package source):
| Python | Julia |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |