Entropy Features

This chapter explores the use of entropy as a feature for machine learning in finance. The core idea is that entropy measures the amount of information, complexity, or unpredictability in a price series.

Inefficient Markets (Low Entropy): Price information is redundant and contains predictable patterns (e.g., bubbles, trends). Momentum strategies may be more profitable.
Efficient Markets (High Entropy): Price information is "decompressed" and non-redundant, (i.e., unpredictable, like a random walk). Mean-reversion strategies may be more profitable.

Key Concepts from Information Theory

Shannon's Entropy ( $H[X]$ ): The average amount of information (in bits) from a data source. It measures uncertainty. An outcome with low probability ( $p[x]$ ) provides more information ( $\log_2(1/p[x])$ ).
$H[X] \equiv-\sum_{x \in A} p[x] \log _{2} p[x]$
- $H[X] = 0$ indicates perfect certainty.
- $H[X] = \log_2(|A|)$ indicates maximum uncertainty (a uniform distribution).
Redundancy ( $R[X]$ ): Measures the amount of predictable, redundant information.
$R[X] \equiv 1-\frac{H[X]}{\log _{2}[|A|]}$
Mutual Information (MI): A generalized measure of (linear or nonlinear) association between two variables.
$M I[X, Y]=H[X]+H[Y]-H[X, Y]$
- For Gaussian variables, it relates to the Pearson correlation ( $\rho$ ): $M I[X, Y]=-\frac{1}{2} \log \left(1-\rho^{2}\right)$

Encoding Continuous Data

To calculate entropy, continuous data (like returns, $r_t$ ) must be converted into a discrete message with a finite alphabet.

Binary Encoding: Converts data based on sign (e.g., $r_t > 0 \to 1$ , $r_t < 0 \to 0$ ). Simple, but loses information about the magnitude.
Quantile Encoding: Bins data into $q$ quantiles. Each bin gets a letter, resulting in a uniform distribution of codes in-sample.
Sigma Encoding: Bins data into fixed-width buckets based on standard deviation ( $\sigma$ ). This results in a non-uniform distribution, but "rare" codes (tail events) will spike entropy.

Implementation: Entropy Estimators

In our RiskLabAI.features.entropy_features module, we implement several methods for estimating the entropy of a discretized message, which can be used to measure market efficiency or predictability.

Shannon's Entropy

As a baseline, we provide a direct implementation of Shannon's Entropy from shannon.py. This function calculates the entropy based on the frequency of individual characters (symbols) in the message.

Plug-In (PMF) Estimator

The "Plug-In" method calculates entropy based on the Probability Mass Function (PMF) of "words" (n-grams) of a specific length. This is a two-step process in our library:

PMF Calculation: First, we use probability_mass_function from pmf.py to calculate the empirical frequency of all n-grams of a given length.
Plug-In Estimator: The plug_in_entropy_estimator from plug_in.py then consumes this PMF, calculates the Shannon entropy, and normalizes it by the word length.

Lempel-Ziv (LZ) Estimators

We also implement estimators based on Lempel-Ziv compression, which are more robust as they capture the complexity and redundancy of a message.

Simple LZ Entropy: From lempel_ziv.py, we provide a basic implementation that calculates the number of unique substrings encountered during a one-pass traversal of the message, normalized by the message length.
Kontoyiannis Entropy: From kontoyiannis.py, we implement a more sophisticated LZ-based estimator (Kontoyiannis, 1998). This method estimates entropy by analyzing the length of the longest substring match $L_i$ found within a look-back window $n$ . It can operate with either an expanding window (default) or a fixed rolling window.

(check the above for a typo error!!!!????)

Estimating Entropy

Two main methods are presented for estimating entropy from a data sequence:

Plug-In (Maximum Likelihood) Estimator: A simple method that calculates the empirical frequency ( $\hat{p}_w$ ) of all "words" (substrings) of length $w$ in the message.
$\hat{H}_{n, w}=-\frac{1}{w} \sum_{y_{1}^{w} \in A^{w}} \hat{p}_{w}\left(y_{1}^{w}\right) \log _{2} \hat{p}_{w}\left(y_{1}^{w}\right)$
Lempel-Ziv (LZ) Estimators: A more robust method based on data compression. The core idea is that complex (high-entropy) messages are difficult to compress. It works by finding the length of the longest matching substring ( $L_i^n$ ) from the past.
- Based on the limit: $\lim _{n \rightarrow \infty} \frac{L_{i}^{n}}{\log _{2}[n]}=\frac{1}{H}$
- A practical, modified estimator (sliding window) is: $\tilde{H}_{n, k}=\frac{1}{k} \sum_{i=1}^{k} \frac{\log _{2}[n]}{L_{i}^{n}}$
- And for an expanding window: $\tilde{H}_{n}=\frac{1}{n} \sum_{i=2}^{n} \frac{\log _{2}[i]}{L_{i}^{i}}$

Entropy as a Measure of Diversity

Entropy can also be understood as the logarithm of the "effective number" of states, which connects it to the Generalized Mean.

Generalized Mean: $M_{q}[x, p]=\left(\sum_{i=1}^{n} p_{i} x_{i}^{q}\right)^{1 / q}$
Entropy as a Generalized Mean: $H[p] = \log \left(\lim _{q \rightarrow 1} N_{q}[p]\right) \quad \text{where} \quad N_{q}[p]=\frac{1}{M_{q-1}[p, p]}$ This shows that entropy is a specific instance ( $q \to 1$ ) of a broader family of diversity measures.

Financial Applications

Market Efficiency: Used to estimate the "informational content" of prices. Low entropy suggests an inefficient, predictable market, whereas high entropy suggests an efficient, random market.
Portfolio Concentration (Meucci): Measures portfolio diversification by calculating the entropy of risk contributions ( $\theta_i$ ) from each principal component.
- Risk Contribution: $\theta_{i}=\frac{\left(f_{\omega}\right)_{i}^{2} \Lambda_{i, i}}{\sum_{n=1}^{N}\left(f_{\omega}\right)_{n}^{2} \Lambda_{n, n}}$
- Concentration: $H=1-\frac{1}{N} e^{-\sum_{n=1}^{N} \theta_{i} \log \left(\theta_{i}\right)}$
Market Microstructure (VPIN): Helps determine the probability of adverse selection.
- The Probability of Informed Trading (PIN) model is linked to order flow imbalance using volume bars, resulting in VPIN. $P I N=\frac{\alpha \mu}{\alpha \mu+2 \varepsilon} \quad \implies \quad V P I N \approx \mathrm{E}(|2 v_{\tau}^{B}-1|)$
- Adverse selection occurs when order flow imbalance is both high (high VPIN) and unpredictable. This unpredictability can be measured by calculating the entropy of the quantized order flow imbalance series.

API reference

RiskLabAI implements these in Python and Julia (signatures auto-generated from the package source):

Python	Julia
`def shannon_entropy(message: str) -> float:`	`function shannon_entropy(message::AbstractString)`
`def probability_mass_function( message: str, approximate_word_length: int ) -> dict[str, float]:`	`function probability_mass_function(message::AbstractString, approximate_word_length::Integer)`
`def plug_in_entropy_estimator(message: str, approximate_word_length: int = 1) -> float:`	`function plug_in_entropy_estimator(message::AbstractString, approximate_word_length::Integer = 1)`
`def lempel_ziv_entropy(message: str) -> float:`	`function lempel_ziv_entropy(message::AbstractString)`
`def longest_match_length(message: str, i: int, n: int) -> tuple[int, str]:`	`function longest_match_length(message::AbstractString, i::Integer, n::Integer)`
`def kontoyiannis_entropy(message: str, window: Optional[int] = None) -> float:`	`function kontoyiannis_entropy(message::AbstractString; window::Union{Nothing,Integer} = nothing)`

Full source: Python · Julia