- Published on
Ensemble Techniques in Finance
Ensemble Techniques in Finance
When dealing with a set of training observations and outcomes , you would want to estimate a function that closely matches the true function .
The function is generally modeled as:
where is white noise.
The aim is to minimize the mean squared error given by:
This can be further decomposed into three terms: Bias squared, Variance, and the noise term .
The Power of Ensemble Methods
Ensemble techniques, like bagging (Bootstrap Aggregation), aim to build a robust model by averaging the predictions of multiple base models. Bagging is particularly effective in reducing the variance of the prediction, hence addressing overfitting.
The variance of a bagged prediction depends on the number of base models , their average variance , and the average correlation between their predictions .
The benefit of bagging is quantifiable. It's effective as long as:
Accuracy Considerations in Bagging Classifiers
For a bagging classifier predicting classes, the accuracy depends on the number of base classifiers and their individual accuracy . Under certain conditions, bagging classifiers can outperform individual classifiers.
The mathematical equation that demonstrates this is:
Both Python and Julia functionalities for bagging classifier accuracy are available in the RiskLabAI library.
def bagging_classifier_accuracy(
N: int,
p: float,
k: int = 2
) -> float:
probability_sum = sum(comb(N, i) * p**i * (1 - p)**(N - i) for i in range(floor(N / k) + 1))
return 1 - probability_sum
Handling Dependency in Observations: A Challenge to Bagging
Financial observations often exhibit dependency, challenging the assumption that data points are independent and identically distributed (IIDs). This dependency affects bagging in two ways:
- The samples in replacement sets are similar, reducing the efficiency of bagging in lowering prediction variance.
- The 'out-of-bag' accuracy becomes inflated, meaning the model may seem more accurate than it actually is.
Advantages of Using Random Forests
Random Forests (RF) are an extension of bagging that introduce another layer of randomness to combat overfitting and dependency in observations. RF offers the following advantages:
- Reduced prediction variance
- Feature significance analysis
- Reliable out-of-the-bag accuracy estimates
Boosting Poor Estimators for Accuracy
Boosting is an iterative method of improving model accuracy by combining poor estimators. It involves adjusting sample weights based on their classification results and produces a weighted average of individual forecasts.
Bagging vs. Boosting in Finance
While boosting minimizes both prediction variance and bias, it is more prone to overfitting—especially in financial applications where data is noisy. In contrast, bagging is less prone to overfitting and can be parallelized, making it generally more effective for financial data.
Leveraging Parallelism for Scalability
For algorithms that don't scale well, like Support Vector Machines (SVMs), bagging can be parallelized to enhance efficiency. This allows you to run multiple weaker models concurrently, making bagging a scalable option for large datasets.
References
- De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
- De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.