Learning Free Energy with Physics and AI: The Story Behind Fokker–Planck Score Learning

Read "Fokker−Planck Score Learning: Efficient Free-Energy Estimation under Periodic Boundary Conditions"

Nov 03, 2025

DOI: 10.1021/acs.jpcb.5c04579 (link)

In the world of molecular simulation, we often try to make molecules talk. They won’t speak English, of course — but if we listen carefully to their atomic jostling, they whisper secrets about energy, stability, and motion. One of the clearest ways to translate that whisper into science is through something called the potential of mean force, or PMF.

The Potential of Mean Force — The Map Beneath Molecular Motion

Imagine pushing a small molecule through a lipid membrane. Sometimes it glides smoothly; sometimes it resists as though the membrane were a sticky wall. That invisible resistance — how “energetically hard” or “easy” it is for the molecule to be at each position — is encoded in the PMF.

Formally, the PMF is the free energy landscape projected onto a chosen coordinate, like the molecule’s distance along the membrane normal. Where the PMF dips, the system prefers to be; where it rises, crossing is rare.

For biomolecular simulations, this is gold: PMFs tell you how ions cross channels, how ligands bind, how proteins fold, and how drugs permeate membranes. In short, the PMF connects structure, motion, and thermodynamics in one function U(x).

But computing it accurately is often painful. Sampling rare transitions means long trajectories and sophisticated biasing schemes.

Old and New Roads to the PMF — Equilibrium and Nonequilibrium Paths

Traditionally, people compute PMFs using umbrella sampling or free-energy perturbation. You constrain the molecule at many positions, run equilibrium simulations, and then patch together the biased results using reweighting (e.g., WHAM or MBAR). It works, but it’s expensive: each “umbrella” is its own mini simulation.

Enter nonequilibrium pulling and the Jarzynski equality. Instead of equilibrating at each position, you pull the molecule through — imagine attaching it to a virtual spring and dragging it at constant speed. The work done on the system is random, but Jarzynski showed a remarkable truth:

\(\langle e^{-\beta W}\rangle = e^{-\beta \Delta F}\)

Average the exponential of the work, and you get the exact free energy difference ΔF.

It’s beautiful theory — but cruel in practice. The exponential average is dominated by rare low-work trajectories, so convergence is slow. You get speed at the cost of statistical pain.

This paper’s authors looked at that and thought: what if we learned the underlying free-energy landscape directly from those non-equilibrium trajectories, instead of averaging them?

Periodic Boundary Conditions — The Hidden Structure in the Simulation Box

Most molecular dynamics runs happen in a box with periodic boundary conditions (PBCs). The simulation box repeats infinitely, so nothing has a real “edge.” For something like a membrane, the reaction coordinate (say, the solute’s position zzz) is also periodic — move one box length, and you’re back where you started.

This periodicity changes how non-equilibrium steady states behave. Under a constant pulling force, your system doesn’t settle into a static Boltzmann distribution — it forms a steady flow around the ring, where probability continuously circulates. Mathematically, that means the probability density p(x)p(x)p(x) satisfies a Fokker–Planck equation with a constant flux J.

That’s the key insight: under periodic conditions, there’s an analytic formula connecting the steady-state density p(x) to the underlying potential U(x) and the applied force f. In other words, periodicity isn’t a nuisance — it’s a feature we can exploit to learn U(x) more efficiently.

The Fokker–Planck Nonequilibrium Steady State — A Bridge Between Physics and Data

The Fokker–Planck equation describes how probabilities flow in time under drift (forces) and diffusion (thermal noise). At steady state, the net probability current is constant:

\(J = -D(x)\big[\nabla p(x) + p(x)\beta \nabla U_{\text{eff}}(x)\big]\)

Here

\(U_{\text{eff}}(x)=U(x)-fx\)

includes both the potential and the pulling force.

For periodic systems, this equation has a known steady-state solution — the non-equilibrium steady state (NESS). This NESS contains the fingerprints of both the energy landscape and the diffusion profile. If you can measure p(x)p(x)p(x) from simulation trajectories, you can, in principle, invert this relation to get U(x)U(x)U(x).

The authors realized: this NESS solution could serve as a physics-based prior for machine learning — a constraint that tells the model what shapes of probability distributions are physically consistent.

Learning Free Energy with a Diffusion Model

Now comes the modern twist. The authors used a score-based diffusion model, a type of generative model that learns the score function

\(s(x) = \nabla \log p(x)\)

the gradient of the log-probability density.

In simple terms: the score tells you which way the data density increases. If you know s(x), you know the “force field” of the probability landscape.

So instead of learning arbitrary statistics, they trained a neural network

\(s_\theta(x, \tau)\)

whose score follows the Fokker–Planck NESS form. They fed it nonequilibrium pulling trajectories under PBC and enforced periodicity using Fourier features (so the model “wraps around” naturally). The network learned to predict s(x) that’s consistent with both the data and the Fokker–Planck physics.

Once trained, integrating that score gives you the PMF. No need for biased windows or exponential averages — the diffusion model reconstructs the landscape by combining physics and data-driven learning.

For experts: this is a physics-informed score-based diffusion model, where the score field satisfies the NESS constraint analytically, eliminating the need for empirical correction terms.

The Demonstration — Pushing Molecules Through Membranes

They tested the method on a classic benchmark: a coarse-grained solute diffusing through a POPC lipid bilayer. Normally, such PMFs require dozens of umbrella windows to converge. With their Fokker–Planck Score Learning (FPSL) approach, they achieved equivalent accuracy with roughly one-tenth the simulation effort.

They also handled position-dependent diffusion D(z), symmetry constraints (membrane center symmetry U(z)=U(−z)), and produced uncertainty estimates with significantly reduced variance compared to MBAR. The periodic physics and the learned score worked together to stabilize the free-energy reconstruction even where data were sparse.

In short: by combining a closed-form physical law (Fokker–Planck NESS) with a modern learning framework (score-based diffusion), they turned non-equilibrium data — which used to be noisy and inefficient — into a powerful source of free-energy information.

Epilogue — Why This Matters

This paper sits at the crossroads of molecular physics and machine learning. It doesn’t throw away physical understanding; it builds learning around it. The physics provides structure; the neural network provides flexibility. Together, they open a path toward using short, noisy non-equilibrium simulations to extract equilibrium thermodynamics — something previously thought unreliable.

It’s a small revolution in simulation thinking: not more sampling, but smarter inference.

And that’s a principle that reaches far beyond biomolecules — it’s how science itself learns from the restless motion of the world.

First Principal Academy

Discussion about this post

Ready for more?