Estimating Errors for Sloppy Models

A replacement for wisdom and experience?

Science is filled with important systems that are too complex and subtle to be modeled directly and quantitatively. An ecosystem has many interacting species, a cell has many interacting proteins and genes, and a material has many atoms whose interactions involve difficult quantum calculations. When scientists study these systems, their models often are "empirical" -- simplifications of the real world, with many parameters fit to observed or measured data. A key question is when we can trust the predictions: up until now, only those with wisdom and long experience could judge when a prediction was likely to be correct.

In collaboration with Karsten Jacobsen and Søren Frederiksen in Denmark, we've proposed a new method for estimating one source of prediction errors for these models. We've found, for applications in both biology and physics, that the parameters in these multiparameter models are sloppy -- varying in concert by many orders of magnitude without making the fits significantly worse! When we plug in these wildly varying almost-as-good parameters into our models, we get a range of values for predicted properties, which we can use to estimate "sloppy model" error estimates.

Søren's thesis was on the interaction forces between atoms of the element Molybdenum. He fit the many parameters of their force model to training data generated by quantum calculations. These quantum calculations have errors that for this purpose were negligable: the errors in Søren's prediction using the force model were systematic errors (due to inadequacies in the model) rather than statistical errors (due to errors in the fitted data). The figure at left shows the true forces (green arrows) and forces generated using Søren's potential (red), with a sampling of parameters as described below.

Can we get any idea for how big these systematic errors might be? For example, Søren's best fit value for the energy difference between the fcc and bcc crystal structures (red line) was off from the true value (DFT, blue line) by more than a factor of two, while his predicted values for various elastic constants were often within one or two percent of the quantum calculations. (How accurate an old interatomic potential will be when used for a new purpose is called transferability.)

We suggested that Søren try looking at the whole range of values for his measured properties, allowing the parameters to vary away from the best fit ones. Clearly fits that deviate from the best fit less than the error in the best fit are also reasonable choices! How do we sample these fluctuations? We do statistical mechanics in model space, sampling them at a temperature T0 set to make the fluctuations from the best fit equal to the deviation of the best fit from the quantum calculation. You see in the figure that the range of predictions at T0 (green curve) overlaps the true answer (vertical blue line).

Using these alternative fits, Søren generated alternative predictions to various materials properties of interest in molybdenum. He compared the the range of predictions (the `sloppy model' error predictions) to the actual error, found by directly calculating the predicted property using the more expensive quantum calculations and subtracting the best fit. The error estimates and the actual errors varied over a large range. The actual errors, though, were rather well predicted by the range of alternative predictions! We tested this by dividing the actual errors by the predicted errors: a perfect error prediction would give a Gaussian or normal distribution for this ratio. At right you see a plot of the integrated probability that this ratio is less than r: the Gaussian is shown by the solid line. The jagged curve gives the ratios for all the quantities Søren calculated, using three different potentials: Søren's MEMT potential, the classic Finnis-Sinclair potential, and a rival modern potential called MEAM. (The smooth curves represent predictions to force data for the three potentials.)

The sloppy model error was a good estimate, for Molybdenum, of the entire systematic error! It isn't perfect: in the tail of the distribution: Søren finds predictions for which the sloppy model error is perhaps only half of the entire systematic error. But now we can estimate, without wisdom and years of experience, at least one source of error in these calculations...

This research was paid for by THE US GOVERNMENT by the NSF.


"Bayesian ensemble approach to error estimation of interatomic potentials", Søren L. Frederiksen, Karsten W. Jacobsen, Kevin S. Brown, and James P. Sethna, Phys. Rev. Lett. 93, 165501 (2004).
Related work on density functionals for electronic structure calculations: "Bayesian Error Estimation in Density Functional Theory", J. J. Mortensen, K. Kaasbjerg, S. L. Frederiksen, J. K. Norskov, James P. Sethna, and K. W. Jacobsen, Phys. Rev. Letters 95, 216401 (2005).

Last modified: January 25, 2005

James P. Sethna,

Statistical Mechanics: Entropy, Order Parameters, and Complexity, now available at Oxford University Press (USA, Europe).