Dynamic Programming as Discrete Calculus of Variations

Computer science meets classical mechanics

21 Mar 2026•Technical•17 min

Preamble

Recently I’ve been enjoying studying physics and the Principle of Least Action ¹ is a very interesting way to rethink mechanics. It reformulates Newton’s laws where instead of $F=ma$, you say nature takes the path of least action. Strangely it felt like I’ve seen this structure before: condition on what you observe, solve for the path you aim to predict. Intuitively, this felt very similar to what a dynamic programming algorithm: the Viterbi algorithm does for a Hidden Markov Model.

Introduction

It turns out dynamic programming can be viewed as a discrete analogue of the calculus of variations - both are frameworks for optimizing over paths, one discrete and one continuous.

The calculus of variations asks: given a cost that depends on a path, which path makes the total cost the smallest? Where ordinary calculus finds the input $x$ that minimises $f(x)$ (via stationary points), the calculus of variations finds the function $q(t)$ that makes a functional $J[q] = \int_0^T L(q, \dot{q}, t), dt$ the smallest (similarly through stationary points).

Note a "functional" takes a function as input and returns a scalar, i.e. J: (R -> R) -> R - a definite integral is a common example.

To get there, I'll build off a simple CS problem until we've recovered the Principle of Least Action for a particle in one dimension (extending to higher dimensions is left as an exercise for the reader).

Shortest Path on a Grid

Starting off with a classic computer science problem (hoping I don't trigger uni / leetcode PTSD for you) - suppose we have an $n \times m$ grid where each edge has a transition cost. Starting at $(0, 0)$, we want to find the shortest path to all other nodes, moving only right or down.

For a brute force approach, we have $O(2^{n+m})$ cases to check. However, with dynamic programming we can reduce this to $O(nm)$.

Suppose the shortest path from $(0,0)$ to $(i,j)$ goes through an intermediate node $(i',j')$. Then the sub-path from $(0,0)$ to $(i',j')$ must be the shortest because if it weren't, we could swap in a cheaper prefix and get a cheaper overall path, which shouldn't be possible since we were looking at the shortest path from $(0,0)$ to $(i,j)$.

This lets us build up the solution incrementally where rather than considering all possible paths, at each node we just answer:

What’s the cheapest way to reach here?

Since the answer only depends on the cheapest way to reach each predecessor, we can fill the table left-to-right, top-to-bottom. This is the principle of optimal substructure - sub-paths of optimal paths are themselves optimal.

To write the recurrence, we need a cost function for each edge. Since we can only move right or down, each edge can be uniquely identified by its starting node $(i, j)$ and direction $d \in \{\rightarrow, \downarrow\}$. So the minimum number of parameters is:

$$ c(i, j, d) $$

This gives us the recurrence:

$$ S(i, j) = \min \{ S(i-1,j) + c(i-1, j, \downarrow),\ S(i, j-1) + c(i, j-1, \rightarrow) \} $$

where $S(i, j)$ is the minimum cost to reach cell $(i, j)$. This recurrence is known as the Bellman equation.

def shortest_path(cost: callable, n: int, m: int):
    S = init_dp(n, m) # cost table
    P = init_dp(n, m) # predecessor table
    for i in range(1, n+1):
        for j in range(1, m+1):
            best = float('inf')
            for (ip, jp) in predecessors(i, j):
                candidate = S[ip][jp] + cost(ip, jp, i, j)
                if candidate < best:
                    best = candidate
                    P[i][j] = (ip, jp)
            S[i][j] = best
    return P

To reconstruct the actual shortest path, we can trace back through the predecessor table from any endpoint:

def trace_path(P, i: int, j: int) -> list[tuple[int, int]]:
    if (i, j) == (0, 0):
        return [(0, 0)]
    return trace_path(P, *P[i][j]) + [(i, j)]

Reframing to Space and Time

We'll make three changes to our grid problem to connect it to more "practical" physics

Relabel the dimensions as space and time,
Enforce that time always advances
Allow higher velocities

Relabelling Dimensions

columns are now time values $t = 0, 1, ..., T$ (renaming $m \to T$)
rows are now position values $q = 0, 1, ..., Q$ (renaming $n \to Q$)

Nothing mathematically has changed - it's the same code and recurrence, just reinterpreted. The grid is now a discrete $q$-$t$ state space, and a path through it is a trajectory $q(t)$.

Note for my CS friends, it's a physics convention to use $q$ for position as $p$ was taken for momentum

Enforcing Time

We now impose a key constraint: time must always advance, so every step must move us to the next time column. This removes the old vertical edges (which advanced $q$ without advancing $t$, suggesting we were teleporting).

In their place, each step goes from $(q, t)$ to either $(q, t+1)$ (stay in place, $\dot{q}=0$) or $(q+1, t+1)$ (move one step, $\dot{q}=1$).

As a side benefit - since time only moves forward, the graph is topologically ordered - we can just sweep left to right in our DP algorithm

Enabling higher velocity

Our existing edges enforce a "very constraining" velocity $\dot{q} \in \{0, 1\}$. To allow discrete velocities $\dot{q} > 1$, we introduce new edges that allow us to transition from state $(q, t)$ to $(q + \dot{q},\ t + 1)$ for any $\dot{q} \in \{0, 1, \ldots, Q\}$.

To simplify things for our example, we've constrained ourselves to "positive velocities", hence there are no edges going "up and right"

Reformulating our cost function

Since we introduced more edges, we need to update our cost function, as a cost had to be defined for each edge.

An edge is now determined by three things:

$q_i$ the starting position
$q_j$ the ending position
$t$ the starting time

$$ cost(q_i \rightarrow q_j, t) $$

However, we can replace the ending position with velocity. Since discrete velocity is defined as

$$ \dot{q} = \frac{q_j - q_i}{\Delta t} $$

we can recover $q_j$ from $q_i$, $\dot{q}$, and $\Delta t$ - so we can equivalently write the cost as $cost(q, \dot{q}, t)$ without losing any information.

Our recurrence becomes:

$$ S(q, t) = \min_{q_p \in \text{pred}(q)} \big[ S(q_p, t-1) + cost(q_p, \dot{q}, t-1) \big] $$

where $\dot{q} = q - q_p$ (since $\Delta t = 1$) and $\text{pred}(q)$ is short for "predecessor of $q$".

def shortest_path(cost: callable, Q: int, T: int):
    S = init_dp(Q, T)
    P = init_dp(Q, T)
    for t in range(1, T+1):
        for q in range(Q+1):
            best = float('inf')
            for qp in predecessors(q):
                dq = q - qp
                candidate = S[qp][t-1] + cost(qp, dq, t-1)
                if candidate < best:
                    best = candidate
                    P[q][t] = qp
            S[q][t] = best
    return P

Now you may be wondering the motivation for why we reformulated to use velocity rather than start and end position. The answer: $cost(q, \dot{q}, t)$ is already in the form of a Lagrangian (which will be very important for the next part).

Taking the Continuous Limit

Cost: sum becomes integral

So far we have total cost of a discrete path as

$$ \sum_{t=0}^{T-1} cost(q_t, \dot{q}_t, t) $$

Now shrink the grid spacing - divide $[0, T]$ into $N$ steps each of width $\Delta t = T/N$, and let $N \to \infty$. For the discrete sum to converge to a finite limit, each per-step cost must shrink proportionally with the step size. Concretely, we require $cost(q, \dot{q}, t) = L(q, \dot{q}, t) \cdot \Delta t$ for some function $L$ that stays bounded as $\Delta t \to 0$. Then the Riemann sum converges to an integral:

$$ \sum_{k=0}^{N-1} L\big(q_k,\ \dot{q}_k,\ t_k\big),\Delta t ;\xrightarrow{\Delta t \to 0} \int_0^T L\big(q(t),\ \dot{q}(t),\ t\big), dt $$

This $L$ is the Lagrangian - the continuous version of the per-step cost. Minimising the discrete total cost now becomes minimising the integral of $L$ - this integral is called the action of the particle.

Optimality: predecessors become perturbations

The two formulations optimise differently but, for the physical systems we care about, arrive at the same path.

In DP, we explicitly minimise - at each step we pick the cheapest predecessor, so the optimal path is the one where no single-step swap can reduce the total cost.

$$ S(q, t) = \min_{q_p} \big[ S(q_p, t-1) + cost(q_p, \dot{q}, t-1) \big] $$

If the current path isn't optimal, there exists some predecessor $q_p$ that gives a lower cost - so we swap it in.
In the continuous case there are no discrete predecessors - instead we nudge the entire path by a small smooth bump $\delta q(t)$ that vanishes at the endpoints, and ask whether the cost changes:

$$ \delta J = J[q + \delta q] - J[q] $$

If the path is stationary, $\delta J = 0$ for every such bump. Working through the calculus gives:

$$ \delta J = \int_0^T \left(\frac{\partial L}{\partial q} - \frac{d}{dt}\frac{\partial L}{\partial \dot{q}}\right) \delta q ; dt = 0 $$

Since this must hold for every bump $\delta q$, the bracketed term must be zero everywhere - giving the Euler-Lagrange equation:

$$ \frac{\partial L}{\partial q} - \frac{d}{dt}\frac{\partial L}{\partial \dot{q}} = 0 $$

The concept of the Lagrangian and Euler-Lagrange can be a lot to unpack - I've chosen not to expand on why they are the way they are as there are resources which are easy to find and can articulate the idea better than I ever can. My focus here is on the connection between DP and CoV.

Summary

	Dynamic Programming	Calculus of Variations
Domain	Discrete states/stages	Continuous paths
Objective	$\sum L(q, \dot{q}, t),\Delta t$	$\int L(q, \dot{q}, t), dt$
"Is this path optimal?"	No predecessor swap improves it	No perturbation changes it (to first order)
Optimality condition	Bellman equation	Euler-Lagrange equation

Unlike single-source shortest path, physics fixes both endpoints so we run the full DP table as before, then fix a destination $(Q, T)$ and trace back.

Simulating a Particle with DP

We've gone from discrete to continuous - now let's come full circle and use DP to simulate a physical particle.

The Lagrangian for a particle

The Lagrangian is defined as $L = T - V$, where $T$ is kinetic energy and $V$ is potential energy. For a particle in one dimension under a constant force field of magnitude $a$ in the positive $q$ direction (and setting $m = 1$ for simplicity), the kinetic energy is $T = \frac{1}{2}\dot{q}^2$ and the potential energy is $V(q) = -aq$ (decreasing in the direction of the force):

$$ \begin{align} L(q, \dot{q}) &= T - V \\ &= \frac{1}{2} \dot{q}^2 - (-aq) \\ &= \frac{1}{2} \dot{q}^2 + aq \end{align} $$ Plugging this into the Euler-Lagrange equation gives back $\ddot{q} = a$ - exactly Newton's second law:

$$ \frac{\partial L}{\partial q} - \frac{d}{dt}\frac{\partial L}{\partial \dot{q}} = 0 \implies a - \ddot{q} = 0 \implies \ddot{q} = a $$

Discretising the Lagrangian

Now we just need to use this Lagrangian as our cost function. On our discrete grid with $\Delta t = 1$:

$$ cost(q, \dot{q}, t) = \frac{1}{2} \dot{q}^2 + aq $$

where $\dot{q} = q_{t+1} - q_t$ is the discrete velocity.

Implementation

Let's simulate a particle under a constant force field ($V(q) = -aq$, force in the positive $q$ direction) starting from $q(0) = 0$.

def simulate_particle(a: int, Q: int, T: int):
    S = init_dp(Q, T)
    P = init_dp(Q, T)
    for t in range(1, T + 1):
        for q in range(Q + 1):
            best = float('inf')
            for qp in predecessors(q):
                dq = q - qp
                candidate = S[qp][t-1] + cost(qp, dq, a)
                if candidate < best:
                    best = candidate
                    P[q][t] = qp
            S[q][t] = best
    return P

The DP computes $S(q, t)$ - the minimum action from $(0, 0)$ to every point $(q, t)$ in the state space. To extract a specific physical trajectory, we pick a fixed endpoint $(Q, T)$ and trace back through the predecessors. The position gaps grow each time step - the particle accelerates, just as we'd expect in a constant field.

The vertex costs above show $S^{\mathrm{in}}(q, t)$ — the minimum action to reach each state from the origin. Running the same DP in reverse from $(10, 5)$ gives $S^{\mathrm{out}}(q, t)$ — the minimum action to continue from each state to the destination. Their sum is the total cost of passing through each point, plotted below — the optimal trajectory is the valley floor, where $S^{\mathrm{in}} + S^{\mathrm{out}}$ is minimised.

Our particle simulation recovered the correct physics (yay) - but we didn't have to solve the Euler-Lagrange equation or differentiate anything. We just evaluated the Lagrangian and took the min (bigger yay). This hints at DP's benefit: it doesn't require $L$ to be smooth or even differentiable.

Conclusion

So the intuition from the preamble was right - DP and the calculus of variations really are the same idea at different resolutions. Both ask "which path minimises total cost?", one by sweeping a DAG, the other by solving a differential equation. We started with shortest path on a grid and ended up at the same place as the Principle of Least Action.

Whilst the Euler-Lagrange equation needs $L$ to be smooth, DP doesn't care, all it needs is optimal substructure and there lies a benefit (we don't talk about runtime and curse of dimensionality).

¹ For most cases nature does take the path of least action, though more accurately it takes the path of stationary action, i.e. modifying the path slightly does not adjust the total action. It's a bit outside the scope of this post, so this footnote is my saving grace from being attacked by more physics-pedantic readers.