8. Quadrature and Interpolation#
Contents
8.1. Overview#
In this section we will explore the related concepts of quadrature, interpolation, and discretization of continuous functions and distributions.
using LinearAlgebra, Statistics, Distributions
using QuadGK, FastGaussQuadrature, SpecialFunctions
using Interpolations, Plots
8.2. Numerical Integration#
Many applications require directly calculating a numerical derivative and calculating expectations.
8.2.1. General Packages#
Julia’s Integrals.jl provides a unified interface to many quadrature backends and is differentiable with common AD systems (e.g. Zygote, ForwardDiff, Enzyme), which can be useful when an integral appears inside an optimization or learning problem.
8.2.2. Adaptive Quadrature#
A high accuracy solution for calculating numerical integrals is QuadGK.
using QuadGK
@show value, tol = quadgk(cos, -2π, 2π);
(value, tol) = quadgk(cos, -2π, 2π) = (-1.8403051655192592e-17, 1.0683776742236108e-23)
This is an adaptive Gauss-Kronrod integration technique that’s relatively accurate for smooth functions.
However, its adaptive implementation makes it slow and not well suited to inner loops.
8.2.3. Gauss Legendre#
Alternatively, many integrals can be done efficiently with (non-adaptive) Gaussian quadrature.
For example, using FastGaussQuadrature.jl
x, w = FastGaussQuadrature.gausslegendre(100_000); # i.e. find 100,000 nodes
# integrates f(x) = x^2 from -1 to 1
f(x) = x^2
@show dot(w, f.(x)); # calculate integral
dot(w, f.(x)) = 0.6666666666666665
With the FastGaussQuadrature package you may need to deal with affine transformations to the non-default domains yourself.
8.2.4. Gauss Legendre#
Another commonly used quadrature well suited to random variables with bounded support is Gauss–Jacobi quadrature.
It provides nodes \(s_n\in[-1,1]\) and weights \(\omega_n\) for
For \(X\sim\mathrm{Beta}(\alpha,\beta)\),
with the change of variables \(s=2x-1\) (so \(x=(s+1)/2\), \(dx=ds/2\)). This yields Gauss–Jacobi exponents \(a=\beta-1\), \(b=\alpha-1\) and a factor \(C=2^{-(\alpha+\beta-1)}/B(\alpha,\beta)\):
function gauss_jacobi(F::Beta, N)
s, wj = FastGaussQuadrature.gaussjacobi(N, F.β - 1, F.α - 1)
x = (s .+ 1) ./ 2
C = 2.0^(-(F.α + F.β - 1.0)) / SpecialFunctions.beta(F.α, F.β)
w = C .* wj
return x, w
end
N = 32
F = Beta(2, 2)
F2 = Beta(0.5, 1.2)
x, w = gauss_jacobi(F, N)
x2, w2 = gauss_jacobi(F2, N)
f(x) = x^2
@show dot(w, f.(x)), mean(f.(rand(F, 1000)))
@show dot(w2, f.(x2)), mean(f.(rand(F2, 1000)));
(dot(w, f.(x)), mean(f.(rand(F, 1000)))) = (0.29999999999999993, 0.29628730561123057)
(dot(w2, f.(x2)), mean(f.(rand(F2, 1000)))) = (0.16339869281045832, 0.1570859472887352)
8.3. Gauss Hermite#
Many expectations are of the form \(\mathbb{E}\left[f(X)\right]\approx \sum_{n=1}^N w_n f(x_n)\) where \(X \sim N(0,1)\). Alternatively, integrals of the form \(\int_{-\infty}^{\infty}f(x) exp(-x^2) dx\).
Gauss-hermite quadrature provides weights and nodes of this form, where the normalize = true argument provides the appropriate rescaling for the normal distribution.
Through a change of variables this can be used to calculate expectations of \(N(\mu,\sigma^2)\) variables, through
function gauss_hermite_normal(N::Integer, mu::Real, sigma::Real)
s, w = FastGaussQuadrature.gausshermite(N; normalize = true)
# X ~ N(mu, sigma^2), X \sim mu + sigma N(0,1) we transform the standard-normal nodes
x = mu .+ sigma .* s
return x, w
end
N = 32
x, w = gauss_hermite_normal(N, 1.0, 0.1)
x2, w2 = gauss_hermite_normal(N, 0.0, 0.05)
f(x) = x^2
@show dot(w, f.(x)), mean(f.(rand(Normal(1.0, 0.1), 1000)))
@show dot(w2, f.(x2)), mean(f.(rand(Normal(0.0, 0.05), 1000)));
(dot(w, f.(x)), mean(f.(rand(Normal(1.0, 0.1), 1000)))) = (1.01, 1.0184583146572033)
(dot(w2, f.(x2)), mean(f.(rand(Normal(0.0, 0.05), 1000)))) = (0.0024999999999999996, 0.002598192194683179)
8.4. Gauss Laguerre#
Another quadrature scheme appropriate integrals defined on \([0,\infty]\) is Gauss Laguerre, which approximates integrals of the form \(\int_0^{\infty} f(x) x^{\alpha} \exp(-x) dx \approx \sum_{n=1}^N w_n f(x_n)\).
One application is to calculate expectations of exponential variables. The PDF of an exponential distribution with parameter \(\theta\) is \(f(x;\theta) = \frac{1}{\theta}\exp(-x/\theta)\). With a change of variables we can use Gauss Laguerre quadrature
function gauss_laguerre_exponential(N, theta)
# E[f(X)] = \int_0^\infty f(x) (1/theta) e^{-x/theta} dx = \int_0^\infty f(theta*y) e^{-y} dy.
s, w = FastGaussQuadrature.gausslaguerre(N) # alpha = 0 (default)
x = theta .* s
return x, w
end
N = 64
theta = 0.5
x, w = gauss_laguerre_exponential(N, theta)
f(x) = x^2 + 1
@show dot(w, f.(x)), mean(f.(rand(Exponential(theta), 1_000)))
(dot(w, f.(x)), mean(f.(rand(Exponential(theta), 1000)))) = (1.499999999999965, 1.4453202373746106)
(1.499999999999965, 1.4453202373746106)
Similarly, the Gamma distribution with shape parameter \(\alpha\) and scale \(\theta\) has PDF \(f(x; \alpha, \theta) = \frac{x^{\alpha-1} e^{-x/\theta}}{\Gamma(\alpha) \theta^\alpha}\) for \(x > 0\) with \(\Gamma(\cdot)\) the Gamma special function.
Using a change of variable and Gauss Laguerre quadrature
function gauss_laguerre_gamma(N, alpha, theta)
# For Gamma(shape=alpha, scale=theta) with pdf
# x^{alpha-1} e^{-x/theta} / (Gamma(alpha) theta^alpha)
# change variable y = x/theta -> x = theta*y, dx = theta dy
# E[f(X)] = 1/Gamma(alpha) * ∫_0^∞ f(theta*y) y^{alpha-1} e^{-y} dy
# FastGaussQuadrature.gausslaguerre(N, a) returns nodes/weights for
# ∫_0^∞ g(y) y^a e^{-y} dy, so pass a = alpha - 1.
s, w = FastGaussQuadrature.gausslaguerre(N, alpha - 1)
x = theta .* s
w = w ./ SpecialFunctions.gamma(alpha)
return x, w
end
N = 256
alpha = 7.0
theta = 1.1
x, w = gauss_laguerre_gamma(N, alpha, theta)
f(x) = x^2 + 1
@show dot(w, f.(x)), mean(f.(rand(Gamma(alpha, theta), 100_000)))
(dot(w, f.(x)), mean(f.(rand(Gamma(alpha, theta), 100000)))) = (68.76000000000175, 68.92522035582272)
(68.76000000000175, 68.92522035582272)
8.5. Interpolation#
In economics we often wish to interpolate discrete data (i.e., build continuous functions that join discrete sequences of points).
The package we usually turn to for this purpose is Interpolations.jl.
There are a variety of options, but we will only demonstrate the convenient notations.
8.5.1. Univariate with a Regular Grid#
Let’s start with the univariate case.
We begin by creating some data points, using a sine function
using Interpolations
using Plots
x = -7:7 # x points, coase grid
y = sin.(x) # corresponding y points
xf = -7:0.1:7 # fine grid
plot(xf, sin.(xf), label = "sin function")
scatter!(x, y, label = "sampled data", markersize = 4)
To implement linear and cubic spline interpolation
li = LinearInterpolation(x, y)
li_spline = CubicSplineInterpolation(x, y)
@show li(0.3) # evaluate at a single point
scatter(x, y, label = "sampled data", markersize = 4)
plot!(xf, li.(xf), label = "linear")
plot!(xf, li_spline.(xf), label = "spline")
li(0.3) = 0.25244129544236954
8.5.2. Univariate with Irregular Grid#
In the above, the LinearInterpolation function uses a specialized function
for regular grids since x is a Range type.
For an arbitrary, irregular grid
x = log.(range(1, exp(4), length = 10)) .+ 1 # uneven grid
y = log.(x) # corresponding y points
interp = LinearInterpolation(x, y)
xf = log.(range(1, exp(4), length = 100)) .+ 1 # finer grid
plot(xf, interp.(xf), label = "linear")
scatter!(x, y, label = "sampled data", markersize = 4, size = (800, 400))
At this point, Interpolations.jl does not have support for cubic splines with irregular grids, but there are plenty of other packages that do (e.g. Dierckx.jl and GridInterpolations.jl).
8.5.3. Multivariate Interpolation#
Interpolating a regular multivariate function uses the same function
f(x, y) = log(x + y)
xs = 1:0.2:5
ys = 2:0.1:5
A = [f(x, y) for x in xs, y in ys]
# linear interpolation
interp_linear = LinearInterpolation((xs, ys), A)
@show interp_linear(3, 2) # exactly log(3 + 2)
@show interp_linear(3.1, 2.1) # approximately log(3.1 + 2.1)
# cubic spline interpolation
interp_cubic = CubicSplineInterpolation((xs, ys), A)
@show interp_cubic(3, 2) # exactly log(3 + 2)
@show interp_cubic(3.1, 2.1) # approximately log(3.1 + 2.1);
interp_linear(3, 2) = 1.6094379124341003
interp_linear(3.1, 2.1) = 1.6484736801441782
interp_cubic(3, 2) = 1.6094379124341
interp_cubic(3.1, 2.1) = 1.6486586594237707
1.6486586594237707
See Interpolations.jl documentation for more details on options and settings.
8.6. Discretization of Stochastic Processes#
In many cases a stochastic process is defined on a continuous space, but numerical algorithms are ideal if the space is discretized.
For a Markov process in discrete time, this leads to a Markov chain. See Finite Markov Chains for more details.
8.6.1. Discretizing an AR(1) Process#
Markov chains are routinely generated as discrete approximations to AR(1) processes of the form
Here \(W_{t+1}\) is assumed to be i.i.d. and \(N(0, 1)\).
When \(|\rho| < 1\), the process is stationary with unconditional mean \(\mu_X\) and standard deviation \(\sigma_X\):
The goal of a discretization is to find a set of discrete states, \(\{x_1, \ldots, x_N\}\), and a stochastic transition matrix \(P\) which best approximates the continuous dynamics.
Typically, these approximations adapt the grid of states to the variance of the stationary distribution to ensure sufficient coverage. This is usually parameterized by choosing \(m > 0\) standard deviations on both sides of the stationary distribution to cover.
8.6.2. Tauchen’s Method (1986)#
Tauchen’s method [Tau86] is the most common method for approximating this process with a finite state Markov chain.
Standard implementations often simplify the calculation by discretizing the zero-mean process \(Z_t = X_t - \mu_X\) first, and then shifting the resulting grid. This is equivalent due to the linearity of the AR(1) process.
The zero-mean dynamics are:
The grid for \(Z\), denoted \(z := \{z_1, \ldots, z_N\}\), is constructed to cover \([-m \sigma_X, + m \sigma_X]\). The points are evenly spaced with step size \(d\):
We associate each state \(z_j\) with the interval \([z_j - d/2, z_j + d/2]\). To construct the transition matrix \(P\), we calculate the probability of moving from \(z_n\) to the interval associated with \(z_j\).
Let \(F\) be the CDF of the standard normal distribution.
Since \(Z_{t+1} | Z_t = z_n \sim \mathcal{N}(\rho z_n, \sigma^2)\), we standardize with \(\sigma\).
For interior points \(j = 2, \ldots, N-1\):
For the left boundary \(j = 1\):
For the right boundary \(j = N\):
Finally, the state vector for the original process \(X\) is recovered by shifting the grid: \(x_n = z_n + \mu_X\).
The following code implements this procedure using Julia’s vectorization syntax.
function tauchen(N, rho, sigma, mu, m = 3)
@assert abs(rho) < 1
mu_X = mu / (1 - rho)
sigma_X = sigma / sqrt(1 - rho^2)
# zero-centered grid and midpoints as cutoffs
z = range(-m*sigma_X, m*sigma_X, length = N)
d = step(z)
midpoints = (z[1:(end - 1)] .+ d/2)'
means = rho .* z
Z_scores = (midpoints .- means) ./ sigma
# Construct P: [Left Tail, Interval Diffs, Right Tail]
F = cdf.(Normal(), Z_scores)
P = [F[:, 1] diff(F, dims = 2) (1 .- F[:, end])]
x = z .+ mu_X
return (; P, mu_X, sigma_X, x)
end
tauchen (generic function with 2 methods)
As an example, consider the AR(1) process with parameters \(\rho = 0.9\), \(\mu = 0.2\), and \(\sigma = 0.1\).
We can check that all rows of the transition matrix sum to one, and inspect the transition probabilities from a particular state.
rho = 0.9
mu = 0.2
sigma = 0.1
N = 5
(; P, x) = tauchen(N, rho, sigma, mu)
@show x
println("Row sums of P: ", sum(P, dims = 2))
state_index = 3
println("Transition probabilities from state x = $(x[state_index]):")
for j in 1:N
println(" to x = $(x[j]): P = $(P[state_index, j])")
end
x = 1.3117527983883148:0.34412360080584276:2.688247201611686
Row sums of P:
[1.0; 1.0; 1.0; 1.0; 1.0;;]
Transition probabilities from state x = 2.0000000000000004:
to x = 1.3117527983883148: P = 1.2225797589278506e-7
to x = 1.6558763991941576: P = 0.042659959859755056
to x = 2.0000000000000004: P = 0.914679835764538
to x = 2.344123600805843: P = 0.042659959859755125
to x = 2.688247201611686: P = 1.2225797585418974e-7
Note that the majority of the mass is concentrated around the diagonal, reflecting the high persistence of the AR(1) process with \(\rho = 0.9\).
We can visualize these sorts of transition matrices as a heatmap to get a better sense of the structure.
Below we add more states to make it closer to the continuous process and increase the persistence.
N = 100
rho = 0.98
mu = 0.1
(; P, x) = tauchen(N, rho, sigma, mu)
heatmap(x, x, P,
xlabel = "To State", ylabel = "From State",
title = "Transition Matrix Heatmap", colorbar_title = "Probability")
Note that the transition matrix is highly concentrated close to the diagonal.
The exception are those close to the first and last rows, which are at the boundaries of the truncated state space.
Below we see that the majority of the transition probabilities are close to zero.
threshold = 1E-6
num_nonzero = sum(P .> threshold)
println("Proportion of transitions > $threshold: ", num_nonzero / (N^2))
Proportion of transitions > 1.0e-6: 0.2954