- Published on
Labeling Financial Data
- Authors
- Name
- Tails Azimuth
Table of Contents
Labeling Financial Data
To train a machine learning model, we usually need a labeled dataset. In the world of finance, this involves creating a matrix of features, , and an array of labels or values, . In this blog, we'll delve into various methods of labeling financial data. Whether you're using supervised or unsupervised learning techniques, understanding how to correctly label financial data is crucial for model accuracy.
Fixed-Time Horizon Method
The Fixed-Time Horizon Method is a widely-used technique for labeling in financial Machine Learning literature. It considers a matrix with rows, taken from a time series with bars, where . A label , which can be , is then associated with each row .
The label is defined as follows:
And the price return is given by:
Despite its popularity, this method has limitations, such as the lack of robust statistical features and the use of a constant threshold .
Computing Dynamic Thresholds
Instead of a fixed , we can compute a variable threshold based on the volatility. Below is the code snippet for calculating daily volatility.
Python | Julia | |
---|---|---|
|
|
|
Learning Side and Size
The Triple-Barrier Method enables us to learn both the side and the size of the bet.
The Code for Event Function
Python | Julia |
---|---|
|
|
Meta-Labeling
In financial machine learning (ML), it's crucial to know how to bet: whether to go long, go short, or not bet at all. This blog introduces the concept of meta-labeling, a technique that aims to answer not "which side to bet on" but rather "how much to bet". This is particularly useful for practitioners who already have a primary model for side prediction but are looking to optimize the size of their stake. In this blog, we also offer code snippets available in both Python and Julia, via our RiskLabAI library.
Meta-labeling is a secondary machine learning model that takes into account the output of a primary model. While the primary model determines the side of the bet (long or short), the meta-labeling model focuses on deciding the appropriate stake size, including the option of not betting at all (zero size).
Here's how you can apply meta-labeling in Python and Julia:
Python | Julia |
---|---|
|
|
These functionalities are available in both Python and Julia in the RiskLabAI library.
Why Use Meta-Labeling?
Quantamental Strategy: Meta-labeling allows for the creation of machine learning models that are guided by fundamental, theory-based models. This is particularly useful for organizations aiming to combine quantitative analysis with fundamental insights.
Reduced Overfitting: Since the meta-labeling model doesn't decide the side of the bet, only the size, it helps reduce overfitting.
Strategic Flexibility: Meta-labeling allows for more complex strategies by separating the decision about the side from the decision about the size of the bet.
Optimized Sizing: Focusing a machine learning model on the size of the bet can lead to more accurate and robust results, a crucial aspect often overlooked in traditional models.
How to Apply Meta-Labeling
Meta-labeling can significantly improve the F1-score of your model. The process involves creating a primary model with high recall and then using meta-labeling to improve precision. Essentially, the secondary ML model verifies whether a 'positive' prediction from the primary model should be acted upon or not.
Here are the code snippets for applying label meta-functions:
Python | Julia |
---|---|
|
|
These functionalities are available in both Python and Julia in the RiskLabAI library.
Advanced Topics
Dropping Unnecessary Labels
Sometimes, machine learning models struggle with classes that are heavily imbalanced. In such cases, you can use the dropLabel
function to remove observations associated with infrequent labels.
Python | Julia |
---|---|
|
|
These functionalities are available in both Python and Julia in the RiskLabAI library.
References
- De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
- De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.