- Published on
Entropy Features
- Authors

- Name
- Tails Azimuth
Table of Contents
Entropy Features
This chapter explores the use of entropy as a feature for machine learning in finance. The core idea is that entropy measures the amount of information, complexity, or unpredictability in a price series.
- Inefficient Markets (Low Entropy): Price information is redundant and contains predictable patterns (e.g., bubbles, trends). Momentum strategies may be more profitable.
- Efficient Markets (High Entropy): Price information is "decompressed" and non-redundant, (i.e., unpredictable, like a random walk). Mean-reversion strategies may be more profitable.
Key Concepts from Information Theory
Shannon's Entropy (): The average amount of information (in bits) from a data source. It measures uncertainty. An outcome with low probability () provides more information ().
- indicates perfect certainty.
- indicates maximum uncertainty (a uniform distribution).
Redundancy (): Measures the amount of predictable, redundant information.
Mutual Information (MI): A generalized measure of (linear or nonlinear) association between two variables.
- For Gaussian variables, it relates to the Pearson correlation ():
- For Gaussian variables, it relates to the Pearson correlation ():
Encoding Continuous Data
To calculate entropy, continuous data (like returns, ) must be converted into a discrete message with a finite alphabet.
- Binary Encoding: Converts data based on sign (e.g., , ). Simple, but loses information about the magnitude.
- Quantile Encoding: Bins data into quantiles. Each bin gets a letter, resulting in a uniform distribution of codes in-sample.
- Sigma Encoding: Bins data into fixed-width buckets based on standard deviation (). This results in a non-uniform distribution, but "rare" codes (tail events) will spike entropy.
Implementation: Entropy Estimators
In our RiskLabAI.features.entropy_features module, we implement several methods for estimating the entropy of a discretized message, which can be used to measure market efficiency or predictability.
Shannon's Entropy
As a baseline, we provide a direct implementation of Shannon's Entropy from shannon.py. This function calculates the entropy based on the frequency of individual characters (symbols) in the message.
Plug-In (PMF) Estimator
The "Plug-In" method calculates entropy based on the Probability Mass Function (PMF) of "words" (n-grams) of a specific length. This is a two-step process in our library:
PMF Calculation: First, we use
probability_mass_functionfrompmf.pyto calculate the empirical frequency of all n-grams of a given length.Plug-In Estimator: The
plug_in_entropy_estimatorfromplug_in.pythen consumes this PMF, calculates the Shannon entropy, and normalizes it by the word length.
Lempel-Ziv (LZ) Estimators
We also implement estimators based on Lempel-Ziv compression, which are more robust as they capture the complexity and redundancy of a message.
Simple LZ Entropy: From
lempel_ziv.py, we provide a basic implementation that calculates the number of unique substrings encountered during a one-pass traversal of the message, normalized by the message length.Kontoyiannis Entropy: From
kontoyiannis.py, we implement a more sophisticated LZ-based estimator (Kontoyiannis, 1998). This method estimates entropy by analyzing the length of the longest substring match found within a look-back window . It can operate with either an expanding window (default) or a fixed rolling window.
(check the above for a typo error!!!!????)
Estimating Entropy
Two main methods are presented for estimating entropy from a data sequence:
Plug-In (Maximum Likelihood) Estimator: A simple method that calculates the empirical frequency () of all "words" (substrings) of length in the message.
Lempel-Ziv (LZ) Estimators: A more robust method based on data compression. The core idea is that complex (high-entropy) messages are difficult to compress. It works by finding the length of the longest matching substring () from the past.
- Based on the limit:
- A practical, modified estimator (sliding window) is:
- And for an expanding window:
- Based on the limit:
Entropy as a Measure of Diversity
Entropy can also be understood as the logarithm of the "effective number" of states, which connects it to the Generalized Mean.
- Generalized Mean:
- Entropy as a Generalized Mean:This shows that entropy is a specific instance () of a broader family of diversity measures.
Financial Applications
- Market Efficiency: Used to estimate the "informational content" of prices. Low entropy suggests an inefficient, predictable market, whereas high entropy suggests an efficient, random market.
- Portfolio Concentration (Meucci): Measures portfolio diversification by calculating the entropy of risk contributions () from each principal component.
- Risk Contribution:
- Concentration:
- Risk Contribution:
- Market Microstructure (VPIN): Helps determine the probability of adverse selection.
- The Probability of Informed Trading (PIN) model is linked to order flow imbalance using volume bars, resulting in VPIN.
- Adverse selection occurs when order flow imbalance is both high (high VPIN) and unpredictable. This unpredictability can be measured by calculating the entropy of the quantized order flow imbalance series.
- The Probability of Informed Trading (PIN) model is linked to order flow imbalance using volume bars, resulting in VPIN.
API reference
RiskLabAI implements these in Python and Julia (signatures auto-generated from the package source):
| Python | Julia |
|---|---|
| |
| |
| |
| |
| |
| |