22. Differentiating Models of Economic Dynamics #

Contents

Differentiating Models of Economic Dynamics
- Overview
- Simulating a Linear State Space Model

22.1. Overview #

This lecture provides an introduction to differentiable simulation of dynamic systems in Julia using Enzyme.jl. See auto-differentiation for background on in-place patterns and Enzyme wrappers.

It builds on the linear state space models we introduced in linear models and the kalman filter.

Example applications of these methods include:

calibration
simulated method of moments
estimation

Caution : The code in this section is significantly more advanced than some of the other lectures, and requires some experience with both auto-differentiation concepts and a more detailed understanding of type-safety and memory management in Julia.

Enzyme.jl is under active development and while state-of-the-art, it is often bleeding-edge. Some of the patterns shown here may change in future releases. See auto-differentiation for the latest best practices.

In practice, you may find using an LLM valuable for navigating the perplexing error messages of Enzyme.jl. Compilation times can be very slow, and performance intuition is not always straightforward.

using LinearAlgebra, Random, Plots, Test, Enzyme, Statistics
using Optimization, OptimizationOptimJL, EnzymeTestUtils

22.2. Simulating a Linear State Space Model #

Take the following parameterization of a linear state space model

\[ x_{t+1} = A x_t + C w_{t+1}, \qquad y_t = G x_t + H v_t \]

where \(w_{t+1}\) and \(v_t\) are i.i.d. standard normal shocks. States \(x_t \in \mathbb R^N\) and observations \(y_t \in \mathbb R^M\) are stored column-wise.

See the auto-differentiation lecture for more on efficient in-place operations and mul!.

function simulate_lss!(x, y, model, x_0, w, v)
    (; A, C, G, H) = model
    N, T1 = size(x)
    M, T1y = size(y)
    T = size(w, 2)

    @assert T1 == T + 1 == T1y
    @assert size(v, 2) == T + 1
    @assert size(A) == (N, N)
    @assert size(G) == (M, N)
    @assert size(C, 1) == N
    @assert size(H, 1) == M
    @assert length(x_0) == N

    # Enzyme has challenges with activity analysis on broadcasting assignments
    @inbounds for i in 1:N
        x[i, 1] = x_0[i]
    end
    # Apply evolution and observation equations
    @inbounds for t in 1:T
        @views mul!(x[:, t + 1], A, x[:, t])             # x_{t+1} = A x_t
        @views mul!(x[:, t + 1], C, w[:, t], 1.0, 1.0)   # + C w_{t+1}

        @views mul!(y[:, t], G, x[:, t])                 # y_t = G x_t
        @views mul!(y[:, t], H, v[:, t], 1.0, 1.0)       # + H v_t
    end
    # Apply observation equation at T+1
    @views mul!(y[:, T + 1], G, x[:, T + 1])
    @views mul!(y[:, T + 1], H, v[:, T + 1], 1.0, 1.0)

    return nothing
end

simulate_lss! (generic function with 1 method)

Crucially, this function modifies the preallocated x and y arrays in place without any allocations.

We can use this function to simulate from example matrices and a sequence of shocks

Random.seed!(1234)

N, M, K, L = 3, 2, 2, 2
T = 10

A = [0.8 0.1 0.0
     0.0 0.7 0.1
     0.0 0.0 0.6]
C = 0.1 .* randn(N, K)
G = [1.0 0.0 0.0
     0.0 1.0 0.3]
H = 0.05 .* randn(M, L)
model = (; A, C, G, H)

x_0 = randn(N)
w = randn(K, T)
v = randn(L, T + 1)

x = zeros(N, T + 1)
y = zeros(M, T + 1)

simulate_lss!(x, y, model, x_0, w, v)

time = 0:T
plot(time, x', lw = 2, xlabel = "t", ylabel = "state", label = ["x1" "x2" "x3"],
     title = "State Paths")

plot(time, y', lw = 2, xlabel = "t", ylabel = "observation",
     label = ["y1" "y2"], title = "Observation Paths")

22.2.1. Differentiating the Simulation#

Forward-mode in Enzyme is convenient for impulse-style effects: for example, here we perturb only the \(w_1\) leaving everything else fixed and can see the change in the \(x\) and \(y\) paths

# forward-mode on w[1]
x = zeros(N, T + 1)
y = zeros(M, T + 1)
dx = Enzyme.make_zero(x)
dy = Enzyme.make_zero(y)
dw = Enzyme.make_zero(w)
dw[1] = 1.0                         # unit perturbation to first shock

autodiff(Forward,
         simulate_lss!,
         Duplicated(x, dx),
         Duplicated(y, dy),
         Const(model), # leaving model fixed
         Const(x_0), # leaving initial state fixed
         Duplicated(w, dw), # perturbing w
         Const(v))

dx[:, 1:3]   # early-state sensitivities (impulse response flavor)

3×3 Matrix{Float64}:
0  -0.0359729  -0.0179062
0   0.108721    0.0719087
0  -0.041959   -0.0251754

Batch tangents let us reuse one primal evaluation while seeding multiple partials. Below we differentiate with respect to two entries of \(A\) in one call; Enzyme accumulates the tangents into separate shadow arrays.

dx_batch = (Enzyme.make_zero(x), Enzyme.make_zero(x))
dy_batch = (Enzyme.make_zero(y), Enzyme.make_zero(y))
dmodels = (Enzyme.make_zero(model), Enzyme.make_zero(model))
dmodels[1].A[1] = 1.0
dmodels[2].A[2] = 1.0

autodiff(Forward,
         simulate_lss!,
         BatchDuplicated(x, dx_batch), # batch duplicated to match dmodels
         BatchDuplicated(y, dy_batch),
         BatchDuplicated(model, dmodels),
         Const(x_0),
         Const(w),
         Const(v))
@show dy_batch[1], dy_batch[2];

(dy_batch[1], dy_batch[2]) = ([0.0 -0.8449577163759787 -1.2511430013904163 -1.3687125396869666 -1.2862527573807614 -1.127182293237571 -0.9065081617463366 -0.7031373354359588 -0.41070713205552856 -0.22893611820925336 0.01811071037541087; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0], [0.0 0.0 -0.08449577163759787 -0.18426134028536015 -0.26585419216844874 -0.3147232102559902 -0.3330244765029502 -0.32376794972669876 -0.29695129835228495 -0.2489366220521523 -0.1971492472574319; 0.0 -0.8449577163759787 -1.1666472297528183 -1.1844511994016063 -1.0203985652123122 -0.8124590829815805 -0.5734836852433861 -0.37936938570925977 -0.1137558337032433 0.02000050384289924 0.21525995763284303])

22.2.2. Checking Type Stability and Allocations#

With complicated code, we first need to ensure the code is type-stable. The following call is silent, which indicates there are no type-stability issues.

@inferred simulate_lss!(x, y, model, x_0, w, v)

Next we check that it does not allocate any memory during execution.

using BenchmarkTools
function count_allocs()
    return simulate_lss!(x, y, model, x_0, w, v)
end
@btime count_allocs()

  1.428 μs

 (0 allocations: 0 bytes)

Finally, for complicated functions such as simulations, we cannot assume that Enzyme (or any AD system) will necessarily be correct.

To aid in this, the EnzymeTestUtils provides utilities for this purpose which automatically check against finite-difference approximations using the appropriate seeding.

When using in-place functions mutating the arguments test_forward requires that you pass any mutated arguments as output of the function itself, which can be done by a small wrapper.

function test_forward_simulate_lss!(x, y, model, x_0, w, v)
    simulate_lss!(x, y, model, x_0, w, v)
    return x, y
end
test_forward(test_forward_simulate_lss!,
             Duplicated,
             (x, Duplicated),
             (y, Duplicated),
             (model, Const),
             (x_0, Duplicated),
             (w, Const),
             (v, Const))

Test Summary:                                                                                                                                                                                                                                                                                                                                 | Pass  

Total  Time
test_forward: test_forward_simulate_lss! with return activity Duplicated on (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated), (::@NamedTuple{A::Matrix{Float64}, C::Matrix{Float64}, G::Matrix{Float64}, H::Matrix{Float64}}, Const), (::Vector{Float64}, Duplicated), (::Matrix{Float64}, Const), (::Matrix{Float64}, Const) |   18     18  4.7s

Test.DefaultTestSet("test_forward: test_forward_simulate_lss! with return activity Duplicated on (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated), (::@NamedTuple{A::Matrix{Float64}, C::Matrix{Float64}, G::Matrix{Float64}, H::Matrix{Float64}}, Const), (::Vector{Float64}, Duplicated), (::Matrix{Float64}, Const), (::Matrix{Float64}, Const)", Any[], 18, false, false, true, 1.766025397229417e9, 1.766025401978409e9, false, "/home/runner/.julia/packages/EnzymeTestUtils/yGBt1/src/test_forward.jl", Xoshiro(0x9951797c85a704f1, 0xb9d66be14dfba82b, 0xb170153285fd9556, 0xe90a07f7bdd1fd77, 0x9d4b5ee33e4bd661))

Unlike test_forward, the automatic checks on reverse-mode AD with test_reverse require a scalar output, which we discuss below in the calibration section.

22.2.3. Differentiating Functions of Simulation#

Often we care about scalars of the simulated paths. For example, the average of the first observable is

\[ g(w, v, \theta) = \frac{1}{T+1} \sum_{t=0}^T y_{1,t}. \]

Reverse-mode gives the gradient with respect to all shocks in one sweep while holding parameters fixed.

function mean_first_observation(y)
    return mean(@view y[1, :]) # view to avoid allocation
end

function g(x, y, model, x_0, w, v)
    simulate_lss!(x, y, model, x_0, w, v)
    return mean_first_observation(y)
end

x_rev = zeros(N, T + 1)
y_rev = zeros(M, T + 1)
dx_rev = Enzyme.make_zero(x_rev)
dy_rev = Enzyme.make_zero(y_rev)
dw_rev = Enzyme.make_zero(w)
dv_rev = Enzyme.make_zero(v)

autodiff(Reverse,
         g,
         Duplicated(x_rev, dx_rev), # output/buffer
         Duplicated(y_rev, dy_rev), # output/buffer
         Const(model),
         Const(x_0),
         Duplicated(w, dw_rev),   # active shocks
         Duplicated(v, dv_rev))

@show g(x, y, model, x_0, w, v)

dw_rev # sensitivity wrt evolution shock

g(x, y, model, x_0, w, v) = -0.14416898118888172

2×10 Matrix{Float64}:
 -0.00339415  -0.00376167  -0.00416933  …  -0.0048981  -0.00327026
  0.0323812    0.0313436    0.0300423       0.012146    0.00653554

These examples mirror the larger workflow: write allocation-free, in-place simulations, seed tangents with Duplicated, and use forward or reverse mode depending on whether you want many outputs per input (forward) or many inputs to one scalar (reverse).

As before, for complicated functions we may want to check that the gradients are correct using finite differences.

test_reverse(g,
             Active,
             (x_rev, Duplicated),
             (y_rev, Duplicated),
             (model, Const),
             (x_0, Const),
             (w, Duplicated),
             (v, Duplicated))

# Or with a different set of arguments
test_reverse(g,
             Active,
             (x_rev, Duplicated),
             (y_rev, Duplicated),
             (model, Duplicated),
             (x_0, Duplicated),
             (w, Const),
             (v, Const))

Test Summary:                                                                                                                                                                                                                                                                                                         | Pass  

Total  Time
test_reverse: g with return activity Active on (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated), (::@NamedTuple{A::Matrix{Float64}, C::Matrix{Float64}, G::Matrix{Float64}, H::Matrix{Float64}}, Const), (::Vector{Float64}, Const), (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated) |   25     25  6.0s
Test Summary:                                                                                                                                                                                                                                                                                                         | 

Pass  Total  Time
test_reverse: g with return activity Active on (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated), (::@NamedTuple{A::Matrix{Float64}, C::Matrix{Float64}, G::Matrix{Float64}, H::Matrix{Float64}}, Duplicated), (::Vector{Float64}, Duplicated), (::Matrix{Float64}, Const), (::Matrix{Float64}, Const) |   29     29  5.5s

Test.DefaultTestSet("test_reverse: g with return activity Active on (::Matrix{Float64}, Duplicated), (::Matrix{Float64}, Duplicated), (::@NamedTuple{A::Matrix{Float64}, C::Matrix{Float64}, G::Matrix{Float64}, H::Matrix{Float64}}, Duplicated), (::Vector{Float64}, Duplicated), (::Matrix{Float64}, Const), (::Matrix{Float64}, Const)", Any[], 29, false, false, true, 1.766025415730158e9, 1.766025421200565e9, false, "/home/runner/.julia/packages/EnzymeTestUtils/yGBt1/src/test_reverse.jl", Xoshiro(0x9951797c85a704f1, 0xb9d66be14dfba82b, 0xb170153285fd9556, 0xe90a07f7bdd1fd77, 0x9d4b5ee33e4bd661))

In all cases we must ensure the mutated arguments are passed as Duplicated.

Functions which internally use buffers can allocate them, but will need to ensure that the buffers are of the appropriate type (i.e., duplicated if they are active). This can be achieved with the eltype

function g2(model, x_0, w, v)
    x = zeros(eltype(x_0), N, T + 1)
    y = zeros(eltype(x_0), M, T + 1)
    simulate_lss!(x, y, model, x_0, w, v)
    return mean_first_observation(y)
end
g2(model, x_0, w, v)
gradient(Reverse, g2, model, x_0, w, v)

((A = [-0.7372205768312146 1.5196565304240475 -0.3215079246189682; -0.20446946438535596 0.3321736601370444 -0.05152341036087862; -0.03725460586984415 0.052907253636102144 -0.006038474038815886], C = [1.8213798041590912 1.9974233310121958; 0.3400772654166607 0.35938265277705433; 0.047579521263242976 0.05076566679947853], G = [-0.12103807771237807 0.46627565243299396 -0.1434392076888752; 0.0 0.0 0.0], H = [-0.22308685072343123 0.013864981851811497; 0.0 0.0]), [0.415500297309091, 0.1184618935, 0.023935839100000002], [-0.003394145246121948 -0.0037616670331938515 … -0.0048981016639946225 -0.003270262789304379; 0.032381219205478144 0.03134361649886693 … 0.012146023683800623 0.006535544886053993], [0.009339832073018348 0.009339832073018348 … 0.009339832073018348 0.009339832073018348; -0.0013859147963474192 -0.0013859147963474192 … -0.0013859147963474192 -0.0013859147963474192])

22.2.4. Calibration#

Reverse-mode in particular can be useful for calibration with simulated dynamics.

For example, consider if the our \(A\) matrix was parameterized by a scalar \(a\) in its upper left corner, and we wanted to calibrate \(a\) so that the time average of the first observation matched a target value \(y^* = 0\).

For some technical reasons discussed below, we rewrite the simulation to take the model parameters individually rather than as a named tuple. A version that works with the original function is given below.

function simulate_lss!(x, y, A, C, G, H, x_0, w, v)
    N, T1 = size(x)
    M, T1y = size(y)
    T = size(w, 2)

    @assert T1 == T + 1 == T1y
    @assert size(v, 2) == T + 1
    @assert size(A) == (N, N)
    @assert size(G) == (M, N)
    @assert size(C, 1) == N
    @assert size(H, 1) == M
    @assert length(x_0) == N

    # Enzyme has challenges with activity analysis on broadcasting assignments
    @inbounds for i in 1:N
        x[i, 1] = x_0[i]
    end
    # Apply evolution and observation equations
    @inbounds for t in 1:T
        @views mul!(x[:, t + 1], A, x[:, t])             # x_{t+1} = A x_t
        @views mul!(x[:, t + 1], C, w[:, t], 1.0, 1.0)   # + C w_{t+1}

        @views mul!(y[:, t], G, x[:, t])                 # y_t = G x_t
        @views mul!(y[:, t], H, v[:, t], 1.0, 1.0)       # + H v_t
    end
    # Apply observation equation at T+1
    @views mul!(y[:, T + 1], G, x[:, T + 1])
    @views mul!(y[:, T + 1], H, v[:, T + 1], 1.0, 1.0)

    return nothing
end

function parameterized_A(a)
    return [a 0.1 0.0; 0.0 0.7 0.1; 0.0 0.0 0.6]
end

function loss(u, p)
    (; x_0, w, v, y_target, C, G, H) = p
    T = size(w, 2)
    a = u[1]

    A = parameterized_A(a)

    # Allocate buffers and simulate
    x = zeros(eltype(A), length(x_0), T + 1)
    y = zeros(eltype(A), size(G, 1), T + 1)
    simulate_lss!(x, y, A, C, G, H, x_0, w, v)

    y_mean = mean_first_observation(y)
    return (y_mean - y_target)^2
end

loss (generic function with 1 method)

There are a few tricks to note here, which work around challenges with using Enzyme in its current state.

With the SciML packages, such as Optimization.jl, the AutoEnzyme() automatically determines which variables to mark as Active following certain patterns
In particular, p holds parameters assumed to be constant during optimization, while u holds the optimization variables.
Enzyme generally avoids allocations, but allocation is necessary here since we need to create a new model with the appropriate a value.
For the buffers, note the use of eltype(A) to ensure the correct type, since the A matrix itself will be differentiable.

Using this setup, we can create the p parameter named tuple, none of which will be differentiable, and then use the LBFGS optimizer in OptimizationOptimJL to find the optimal a value.

y_target = 0.0

# Bundle parameters into a named tuple
p = (; x_0, w, v, y_target, C, G, H)

# Initial Guess for 'a'
u_0 = [0.8]

println("Initial a=$(u_0[1]), Loss: $(loss(u_0, p))")

# Define the problem type, associate initial condition and parameters
# AutoEnzyme() will handle the AD setup
optf = OptimizationFunction(loss, AutoEnzyme())
prob = OptimizationProblem(optf, u_0, p)

sol = solve(prob, OptimizationOptimJL.LBFGS())
println("Final a=$(sol.u[1]), Loss: $(loss(sol.u, p))")

Initial a=0.8, Loss: 0.02078469513704013

Final a=-0.379357581082952, Loss: 0.0006086293962801088

22.2.5. Calibration with More Complicated Types#

function loss_2(u, p)

    # Unpack constants
    (; x_0, w, v, y_target, C, G, H) = p
    T = size(w, 2)
    a = u[1]

    A = parameterized_A(a) # A is Active (depends on u)

    # Trick: "Launder" the constants by copying them.
    # Creates new locals which Enzyme sees these as "Local Active Variables"
    # Now build the struct with homogeneous (all local) variables
    model = (; A, C = copy(C), G = copy(G), H = copy(H))

    # Allocate buffers and simulate
    x = zeros(eltype(A), length(x_0), T + 1)
    y = zeros(eltype(A), size(G, 1), T + 1)
    simulate_lss!(x, y, model, x_0, w, v)

    y_mean = mean_first_observation(y)
    return (y_mean - y_target)^2
end

loss_2 (generic function with 1 method)

This version uses our original simulate_lss! function, which takes a named tuple for the model parameters. Note that it contains a trick where the constant parameters C, G, and H are “laundered” by copying them into new local variables before building the named tuple. This ensures that Enzyme can properly analyze which variables are active versus constant when creating the named tuple.

The simulate_lss! approach that splits out the A matrix might be more efficient for large \(C, G, H\) since it avoids the copies, but it may not matter in practice.

The other code is identical with this new loss.

y_target = 0.0

# Bundle parameters into a named tuple
p = (; x_0, w, v, y_target, C, G, H)

# Initial Guess for 'a'
u_0 = [0.8]

println("Initial a=$(u_0[1]), Loss: $(loss_2(u_0, p))")

# Define the problem type, associate initial condition and parameters
# AutoEnzyme() will handle the AD setup
optf = OptimizationFunction(loss_2, AutoEnzyme())
prob = OptimizationProblem(optf, u_0, p)

sol = solve(prob, OptimizationOptimJL.LBFGS())
println("Final a=$(sol.u[1]), Loss: $(loss_2(sol.u, p))")

Initial a=0.8, Loss: 0.02078469513704013
Final a=-0.379357581082952, Loss: 0.0006086293962801088

22. Differentiating Models of Economic Dynamics#

22.1. Overview#

22.2. Simulating a Linear State Space Model#