Detecting Structural Breaks in Financial Markets

Detecting Structural Breaks in Financial Markets

Structural breaks in financial markets refer to a shift from one type of market behavior to another, like from a mean-reverting to a momentum pattern. Such changes often catch market participants off guard, leading them to make costly mistakes. By analyzing these structural breaks, you can make more informed trading decisions. We'll focus on two types of tests for detecting structural breaks: CUSUM tests and explosiveness tests.

Types of Structural Break Tests

  • CUSUM tests: These tests measure if the cumulative forecasting errors significantly deviate from random behavior.
  • Explosiveness tests: These tests identify whether the process shows exponential growth or decline, which would be inconsistent with a random walk or stationary process.

Brown-Durbin-Evans CUSUM Test on Recursive Residuals

The Brown-Durbin-Evans CUSUM test evaluates structural breaks by using recursive least squares (RLS) estimates. The formula for the RLS is as follows:

yt=βtxt+εty_{t} = \beta_{t}^{\prime} x_{t} + \varepsilon_{t}

We compute standardized 1-step ahead recursive residuals using:

ω^t=ytβ^t1xtft\hat{\omega}_{t} = \frac{y_{t} - \hat{\beta}_{t-1}^{\prime} x_{t}}{\sqrt{f_{t}}}
ft=σ^ε2[1+xt(XtXt)1xt]f_{t} = \hat{\sigma}_{\varepsilon}^{2}\left[1+x_{t}^{\prime}\left(X_{t}^{\prime} X_{t}\right)^{-1} x_{t}\right]

Finally, the CUSUM statistic is calculated as:

St=j=k+1tω^jσ^ωS_{t} = \sum_{j=k+1}^{t} \frac{\hat{\omega}_{j}}{\hat{\sigma}_{\omega}}
StructuralBreaks.jl
using LinearAlgebra
using Statistics
using DataFrames
using Dates

function computeBeta(
        X::Matrix,
        y::Vector
    )::Tuple{Vector, Matrix}
    # See the source code  for detailed implementaion.
end

function brownDurbinEvansTest(
        X::Matrix,
        y::Vector,
        lags::Int,
        k::Int,
        index::Vector
    )::DataFrame
    
    β, _ = computeBeta(X, y)
    residuals = y - X * β
    σ = residuals' * residuals / (length(y) - 1 + lags - k)

    startIndex = k - lags + 1
    cumsum = 0
    bdeCumsumStats = zeros(length(y) - startIndex)
    for i in startIndex:length(y) - 1
        X_, y_ = X[1:i, :], y[1:i]
        β, _ = computeBeta(X_, y_)
        ω = (y[i + 1] - X[i + 1, :]' * β) / sqrt(1 + X[i + 1, :]' * inv(X_' * X_) * X[i + 1, :])
        cumsum += ω
        bdeCumsumStats[i - startIndex + 1] = cumsum / sqrt(σ)
    end
    bdeStatsDf = DataFrame(index = index[k:length(y) + lags - 2], bdeStatistics = bdeCumsumStats)
    return bdeStatsDf
end
View More:Julia

Simplified CUSUM Test

A simplified version of the Brown-Durbin-Evans test focuses only on price levels, making it computationally less expensive. It calculates the standardized departure of log-price yty_t relative to a reference price yny_n:

Sn,t=ytynσ^ttnS_{n, t} = \frac{y_{t} - y_{n}}{\hat{\sigma}_{t} \sqrt{t-n}}

When studying financial time series data, using log prices is often more appropriate than using raw prices. Using raw prices can yield results that aren't time-invariant, leading to structural heteroscedasticity. On the other hand, log prices give a more reliable statistical framework for understanding price behaviors.

Mathematically, using raw prices yields a model like:

ytyt11\frac{y_{t}}{y_{t-1}} - 1

Whereas using log prices can be modeled as:

Δlog[yt]log[yt1]\Delta \log [y_{t}] \propto \log [y_{t-1}]

Using log prices also handles changing economic conditions, bubbles, or other economic phases better, especially for data that spans multiple years.

Computational Complexity

The SADF algorithm's computational cost is O(n2)O(n^2). This complexity can quickly become a bottleneck for larger datasets. For example, a full SADF time series for a dataset with (T,N)=(356631,3)(T, N) = (356631, 3) requires around 242242 PFLOPs of operations. Using a High-Performance Computing (HPC) cluster may be necessary for computations within a reasonable time frame.

Conditions for Exponential Behavior

There are three potential states for the system based on log prices: steady, unit-root, and explosive. The behaviors are largely defined by the ( \beta ) parameter in the equation:

Δlog[yt]=α+βlog[yt1]+ϵt\Delta \log [y_{t}] = \alpha + \beta \log [y_{t-1}] + \epsilon_{t}

Quantile ADF and Conditional ADF

Two robust alternatives to SADF are Quantile ADF (QADF) and Conditional ADF (CADF). These methods provide measures of centrality and dispersion of high ADF values, reducing sensitivity to outliers and data specifics.

SADF,QADF and CADF Algorithm Implementation

Below is an example of how there statistics are implemented in the RiskLabAI library.

StructuralBreaks.jl
function adfTestType(
        data::DataFrame,
        minSampleLength::Int,
        constant::String,
        lags,
        type::String;
        quantile::Union{Nothing, Float64} = nothing,
        probability::Vector = ones(size(data, 1))
    )::DataFrame
    
    X, y = prepareData(data.price, constant, lags)
    maxLag = isinteger(lags) ? lags : maximum(lags)
    indexRange = (minSampleLength - maxLag) : length(y)
    result = zeros(length(indexRange))
    for (i, index) in enumerate(indexRange)
        X_, y_ = X[1:index, :], y[1:index]
        adfStats = adf(X_, y_, minSampleLength)
        if isempty(adfStats)
            continue
        end
        if type == "SADF"
            result[i] = maximum(adfStats)
        elseif type == "QADF"
            result[i] = sort(adfStats)[floor(Int, quantile * length(adfStats))]
        elseif type == "CADF"
            perm = sortperm(adfStats)
            perm = perm[floor(Int, quantile * length(adfStats)):end]
            result[i] = adfStats[perm] .* probability[perm] / sum(probability[perm])
        else
            println("type must be SADF or QADF or CADF")
        end
    end
    adfStatistics = DataFrame(index = data.index[minSampleLength:length(y) + maxLag], statistics = result)
    return adfStatistics
end

View More: Julia | Python

This implementation takes seven inputs such as the data frame of close prices (data), minimum sample length (minSampleLength), the type of constant to use (constant), lag values (lags), and the type of ADF test (type).

References

  1. De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
  2. De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.