Cost functions

We give an in-depth guide on how to use the builtin cost functions.

The iminuit package comes with a couple of common cost functions that you can import from iminuit.cost for convenience. Of course, you can write your own cost functions to use with iminuit, but most of the cost function is always the same. What really varies is the statistical model which predicts the probability density as a function of the parameter values. This you still have to provide yourself and the iminuit package will not include machinery to build statistical models (that is out of scope).

Using the builtin cost functions is not only convenient, they also have some extra features.

  • Support of fitted weighted histograms.

  • Technical tricks improve numerical stability.

  • Optional numba acceleration (if numba is installed).

  • Cost functions can be added to fit data sets with shared parameters.

  • Temporarily mask data.

We demonstrate each cost function on an standard example from high-energy physics, the fit of a peak over some smooth background (here taken to be constant).

[1]:
from iminuit import cost, Minuit
# faster than scipy.stats functions
from numba_stats import truncnorm, truncexpon, norm, expon
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import multivariate_normal as mvnorm

We generate our data. We sample from a Gaussian peak and from exponential background in the range 0 to 2. We then bin the original data. One can fit the original or the binned data.

[2]:
xr = (0, 2)  # xrange

rng = np.random.default_rng(1)

xdata = rng.normal(1, 0.1, size=1000)
ydata = rng.exponential(size=len(xdata))
xmix = np.append(xdata, ydata)
xmix = xmix[(xr[0] < xmix) & (xmix < xr[1])]

n, xe = np.histogram(xmix, bins=50, range=xr)
cx = 0.5 * (xe[1:] + xe[:-1])
dx = np.diff(xe)

plt.errorbar(cx, n, n ** 0.5, fmt="ok")
plt.plot(xmix, np.zeros_like(xmix), "|", alpha=0.1);
../_images/notebooks_cost_functions_3_0.svg

We also generate some 2D data to demonstrate multivariate fits. In this case, a gaussian along axis 1 and independently an exponential along axis 2. In this case, the distributions are not restricted to some range in x and y.

[3]:
n2, _, ye = np.histogram2d(xdata, ydata, bins=(50, 20), range=(xr, (0, np.max(ydata))))

plt.pcolormesh(xe, ye, n2.T)
plt.scatter(xdata, ydata, marker=".", color="w", s=1);
../_images/notebooks_cost_functions_5_0.svg

Maximum-likelihood fits

Maximum-likelihood fits are the state-of-the-art when it comes to fitting models to data. The can be applied to unbinned and binned data (histograms).

  • Unbinned fits are the easiest to use, because they can be apply directly to the raw sample. They become slow when the sample size is large.

  • Binned fits require you to appropriately bin the data. The binning has to be fine enough to retain all essential information. Binned fits are much faster when the sample size is large.

Unbinned fit

Unbinned fits are ideal when the data samples are not too large or very high dimensional. There is no need to worry about the appropriate binning of the data. Unbinned fits are inefficient when the samples are very large and can become numerically unstable, too. Binned fits are a better choice then.

The cost function for an unbinned maximum-likelihood fit is really simple, it is the sum of the logarithm of the pdf evaluated at each sample point (times -1 to turn maximimization into minimization). You can easily write this yourself, but a naive implementation will suffer from instabilities when the pdf becomes locally zero. Our implementation mitigates the instabilities to some extend.

To perform the unbinned fit you need to provide the pdf of the model, which must be vectorized (a numpy ufunc). The pdf must be normalized, which means that the integral over the sample value range must be a constant for any combination of model parameters.

The model pdf in this case is a linear combination of the normal and the exponential pdfs. The parameters are \(z\) (the weight), \(\mu\) and \(\sigma\) of the normal distribution and \(\tau\) of the exponential. The cost function detects the parameter names.

It is important to put appropriate limits on the parameters, so that the problem does not become mathematically undefined. * \(0 < z < 1\), * \(\sigma > 0\), * \(\tau > 0\).

In addition, it can be beneficial to use \(-1 < \mu < 1\) (optional), but it is not required. We use truncnorm and truncexpon, which are normalised inside the data range (0, 2).

[4]:
def pdf(x, z, mu, sigma, tau):
    return (z * truncnorm.pdf(x, *xr, mu, sigma) +
            (1 - z) * truncexpon.pdf(x, *xr, 0.0, tau))

c = cost.UnbinnedNLL(xmix, pdf)

m = Minuit(c, z=0.4, mu=0.1, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0, None)
m.migrad()
[4]:
Migrad
FCN = 768.1 Nfcn = 501
EDM = 6.76e-07 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 z 0.537 0.015 0 1
1 mu 0.996 0.004
2 sigma 0.1006 0.0035 0
3 tau 1.05 0.08 0
z mu sigma tau
z 0.000231 -3.06e-06 (-0.052) 1.9e-05 (0.353) -0.000236 (-0.206)
mu -3.06e-06 (-0.052) 1.5e-05 -9.51e-07 (-0.069) -1.54e-05 (-0.053)
sigma 1.9e-05 (0.353) -9.51e-07 (-0.069) 1.25e-05 -4.33e-05 (-0.163)
tau -0.000236 (-0.206) -1.54e-05 (-0.053) -4.33e-05 (-0.163) 0.00568

We visualize the fit.

[5]:
m.visualize()
../_images/notebooks_cost_functions_9_0.svg

We can also fit a multivariate model to multivariate data. We pass model as a logpdf this time, which works well because the pdfs factorise.

[6]:
def logpdf(xy, mu, sigma, tau):
    x, y = xy
    return (norm.logpdf(x, mu, sigma) + expon.logpdf(y, 0, tau))

c = cost.UnbinnedNLL((xdata, ydata), logpdf, log=True)
m = Minuit(c, mu=1, sigma=2, tau=2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()
[6]:
Migrad
FCN = 147.6 Nfcn = 134
EDM = 2.1e-06 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 mu 0.9946 0.0031
1 sigma 0.0986 0.0022 0
2 tau 0.972 0.031 0
mu sigma tau
mu 9.73e-06 5.96e-10 -7.7e-14
sigma 5.96e-10 4.86e-06 -4.72e-18
tau -7.7e-14 -4.72e-18 0.000944

Extended unbinned fit

An important variant of the unbinned ML fit is described by Roger Barlow, Nucl.Instrum.Meth.A 297 (1990) 496-506. Use this if both the shape and the integral of the density are of interest. In practice, this is often the case, for example, if you want to estimate a cross-section or yield.

The model in this case has to return the integral of the density and the density itself (which must be vectorized). The parameters in this case are those already discussed in the previous section and in addition \(s\) (integral of the signal density), \(b\) (integral of the uniform density). The additional limits are:

  • \(s > 0\),

  • \(b > 0\).

Compared to the previous case, we have one more parameter to fit.

[7]:
def density(x, s, b, mu, sigma, tau):
    return s + b, (s * truncnorm.pdf(x, *xr, mu, sigma) +
        b * truncexpon.pdf(x, *xr, 0, tau))

c = cost.ExtendedUnbinnedNLL(xmix, density)

m = Minuit(c, s=300, b=1500, mu=0, sigma=0.2, tau=2)
m.limits["s", "b", "sigma", "tau"] = (0, None)
m.migrad()
[7]:
Migrad
FCN = -2.388e+04 Nfcn = 362
EDM = 2.8e-07 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 s 1.01e3 0.04e3 0
1 b 872 35 0
2 mu 0.996 0.004
3 sigma 0.1006 0.0035 0
4 tau 1.05 0.08 0
s b mu sigma tau
s 1.36e+03 -351 (-0.272) -0.00574 (-0.040) 0.0355 (0.272) -0.44 (-0.158)
b -351 (-0.272) 1.22e+03 0.00574 (0.042) -0.0355 (-0.288) 0.44 (0.167)
mu -0.00574 (-0.040) 0.00574 (0.042) 1.5e-05 -9.28e-07 (-0.068) -1.53e-05 (-0.053)
sigma 0.0355 (0.272) -0.0355 (-0.288) -9.28e-07 (-0.068) 1.25e-05 -4.29e-05 (-0.161)
tau -0.44 (-0.158) 0.44 (0.167) -1.53e-05 (-0.053) -4.29e-05 (-0.161) 0.00568

The fitted values and the uncertainty estimates for the shape parameters are identical to the previous fit.

[8]:
m.visualize()
../_images/notebooks_cost_functions_15_0.svg

Once again, we fit 2D data, using the logdensity mode.

[9]:
def logdensity(xy, n, mu, sigma, tau):
    x, y = xy
    return n, np.log(n) + norm.logpdf(x, mu, sigma) + expon.logpdf(y, 0, tau)

c = cost.ExtendedUnbinnedNLL((xdata, ydata), logdensity, log=True)
m = Minuit(c, n=1, mu=1, sigma=2, tau=2)
m.limits["n", "sigma", "tau"] = (0, None)
m.migrad()
[9]:
Migrad
FCN = -1.167e+04 Nfcn = 328
EDM = 4.42e-06 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 n 1000 32 0
1 mu 0.9946 0.0031
2 sigma 0.0986 0.0022 0
3 tau 0.972 0.031 0
n mu sigma tau
n 1e+03 1.33e-13 7.61e-11 -3.48e-19
mu 1.33e-13 9.73e-06 8.51e-09 (0.001) -6.27e-14
sigma 7.61e-11 8.51e-09 (0.001) 4.86e-06 -2.22e-14
tau -3.48e-19 -6.27e-14 -2.22e-14 0.000944

Binned Fit

Binned fits are computationally more efficient and numerically more stable when samples are large. The caveat is that one has to choose an appropriate binning. The binning should be fine enough so that the essential information in the original is retained. Using large bins does not introduce a bias, but the parameters have a larger-than-minimal variance.

In this case, 50 bins are fine enough to retain all information. Using a large number of bins is safe, since the maximum-likelihood method correctly takes poisson statistics into account, which works even if bins have zero entries. Using more bins than necessary just increases the computational cost.

Instead of a pdf, you need to provide a cdf for a binned fit (which must be vectorized). Note that you can approximate the cdf as “bin-width times pdf evaluated at center”, if the cdf is expensive to calculate, but this is an approxmiation and will lead to a bias. Using the cdf avoids this bias.

[10]:
def cdf(xe, z, mu, sigma, tau):
    return (z * truncnorm.cdf(xe, *xr, mu, sigma) +
            (1-z) * truncexpon.cdf(xe, *xr, 0, tau))

c = cost.BinnedNLL(n, xe, cdf)
m = Minuit(c, z=0.4, mu=0, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0, None)
m.migrad()
[10]:
Migrad
FCN = 30.11 (chi2/ndof = 0.7) Nfcn = 232
EDM = 4.06e-06 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 z 0.538 0.015 0 1
1 mu 0.997 0.004
2 sigma 0.101 0.004 0
3 tau 1.05 0.08 0
z mu sigma tau
z 0.000232 -3.14e-06 (-0.053) 1.9e-05 (0.350) -0.000237 (-0.207)
mu -3.14e-06 (-0.053) 1.53e-05 -1.01e-06 (-0.072) -1.56e-05 (-0.053)
sigma 1.9e-05 (0.350) -1.01e-06 (-0.072) 1.27e-05 -4.34e-05 (-0.162)
tau -0.000237 (-0.207) -1.56e-05 (-0.053) -4.34e-05 (-0.162) 0.00564

The fitted values and the uncertainty estimates for \(\mu\) and \(\sigma\) are not identical to the unbinned fit, but very close. For practical purposes, the results are equivalent. This shows that the binning is fine enough to retain the essential information in the original data.

Note that iminuit also shows the chi2/ndof goodness-of-fit estimator when the data are binned. It can be calculated for free in the binned case.

[11]:
m.visualize()
../_images/notebooks_cost_functions_21_0.svg

Fitting a multidimensional histogram is equally easy. Since the pdfs in this example factorise, the cdf of the 2D model is the product of the cdfs along each axis.

[12]:
def cdf(xe_ye, mu, sigma, tau):
    xe, ye = xe_ye
    return norm.cdf(xe, mu, sigma) * expon.cdf(ye, 0, tau)

c = cost.BinnedNLL(n2, (xe, ye), cdf)
m = Minuit(c, mu=0.1, sigma=0.2, tau=2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()
[12]:
Migrad
FCN = 201.1 (chi2/ndof = 0.2) Nfcn = 281
EDM = 4.15e-08 (Goal: 0.0002) time = 0.6 sec
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 mu 0.9952 0.0031
1 sigma 0.0986 0.0022 0
2 tau 0.969 0.031 0
mu sigma tau
mu 9.86e-06 1.09e-09 -2.7e-11
sigma 1.09e-09 5e-06 -1.78e-11
tau -2.7e-11 -1.78e-11 0.000951

Extended binned maximum-likelihood fit

As in the unbinned case, the binned extended maximum-likelihood fit should be used when also the amplitudes of the pdfs are of interest.

Instead of a density, you need to provide the integrated density in this case (which must be vectorized). There is no need to separately return the total integral of the density, like in the unbinned case. The parameters are the same as in the unbinned extended fit.

[13]:
def integral(xe, s, b, mu, sigma, tau):
    return (s * truncnorm.cdf(xe, *xr, mu, sigma) +
            b * truncexpon.cdf(xe, *xr, 0, tau))

c = cost.ExtendedBinnedNLL(n, xe, integral)
m = Minuit(c, s=300, b=1500, mu=0, sigma=0.2, tau=2)
m.limits["s", "b", "sigma", "tau"] = (0, None)
m.migrad()
[13]:
Migrad
FCN = 30.11 (chi2/ndof = 0.7) Nfcn = 353
EDM = 0.00011 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 s 1.01e3 0.04e3 0
1 b 870 40 0
2 mu 0.997 0.004
3 sigma 0.101 0.004 0
4 tau 1.05 0.08 0
s b mu sigma tau
s 1.5e+03 -375 (-0.257) -0.00925 (-0.061) 0.0274 (0.176) -0.63 (-0.198)
b -375 (-0.257) 1.42e+03 0.00591 (0.040) -0.058 (-0.382) 0.546 (0.177)
mu -0.00925 (-0.061) 0.00591 (0.040) 1.54e-05 -7.39e-07 (-0.047) -1.48e-05 (-0.046)
sigma 0.0274 (0.176) -0.058 (-0.382) -7.39e-07 (-0.047) 1.63e-05 -3.29e-05 (-0.099)
tau -0.63 (-0.198) 0.546 (0.177) -1.48e-05 (-0.046) -3.29e-05 (-0.099) 0.00675
[14]:
m.visualize()
../_images/notebooks_cost_functions_26_0.svg

Again, we can also fit multivariate data.

[15]:
def integral(xe_ye, n, mu, sigma, tau):
    xe, ye = xe_ye
    return n * norm.cdf(xe, mu, sigma) * expon.cdf(ye, 0, tau)

c = cost.ExtendedBinnedNLL(n2, (xe, ye), integral)
m = Minuit(c, n=1500, mu=0.1, sigma=0.2, tau=2)
m.limits["n", "sigma", "tau"] = (0, None)
m.migrad()
[15]:
Migrad
FCN = 200.3 (chi2/ndof = 0.2) Nfcn = 201
EDM = 2.6e-06 (Goal: 0.0002) time = 0.3 sec
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 n 1.000e3 0.032e3 0
1 mu 0.9952 0.0031
2 sigma 0.0986 0.0022 0
3 tau 0.972 0.031 0
n mu sigma tau
n 1e+03 1.57e-08 4.04e-08 0.00297
mu 1.57e-08 9.86e-06 6.62e-10 -8.14e-12
sigma 4.04e-08 6.62e-10 5e-06 2.58e-11
tau 0.00297 -8.14e-12 2.58e-11 0.00098

Temporary masking

In complicated binned fits with peak and background, it is sometimes useful to fit in several stages. One typically starts by masking the signal region, to fit only the background region.

The cost functions have a mask attribute to that end. We demonstrate the use of the mask with an extended binned fit.

[16]:
def integral(xe, s, b, mu, sigma, tau):
    return (s * truncnorm.cdf(xe, *xr, mu, sigma) +
            b * truncexpon.cdf(xe, *xr, 0, tau))

c = cost.ExtendedBinnedNLL(n, xe, integral)

# we set the signal amplitude to zero and fix all signal parameters
m = Minuit(c, s=0, b=1500, mu=1, sigma=0.2, tau=2)

m.limits["s", "b", "sigma", "tau"] = (0, None)
m.fixed["s", "mu", "sigma"] = True

# we temporarily mask out the signal
c.mask = (cx < 0.5) | (1.5 < cx)

m.migrad()
[16]:
Migrad
FCN = 13.04 (chi2/ndof = 0.6) Nfcn = 76
EDM = 1.19e-08 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 s 0.0 0.1 0 yes
1 b 860 40 0
2 mu 1.00 0.01 yes
3 sigma 0.200 0.002 0 yes
4 tau 1.01 0.08 0
s b mu sigma tau
s 0 0 0 0 0
b 0 1.77e+03 0 0 1.01 (0.305)
mu 0 0 0 0 0
sigma 0 0 0 0 0
tau 0 1.01 (0.305) 0 0 0.00621

We plot the intermediate result. Points which have been masked out are shown with open markers.

[17]:
for ma, co in ((c.mask, "k"), (~c.mask, "w")):
    plt.errorbar(cx[ma], n[ma], n[ma] ** 0.5, fmt="o", color=co, mec="k", ecolor="k")
plt.stairs(np.diff(integral(xe, *[p.value for p in m.init_params])), xe,
           ls=":", label="init")
plt.stairs(np.diff(integral(xe, *m.values)), xe, label="fit")
plt.legend();
../_images/notebooks_cost_functions_32_0.svg

Now we fix the background and fit only the signal parameters.

[18]:
c.mask = None # remove mask
m.fixed = False # release all parameters
m.fixed["b"] = True # fix background amplitude
m.values["s"] = 100 # do not start at the limit
m.migrad()
[18]:
Migrad
FCN = 30.21 (chi2/ndof = 0.7) Nfcn = 262
EDM = 2.36e-05 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 s 1.02e3 0.04e3 0
1 b 860 40 0 yes
2 mu 0.997 0.004
3 sigma 0.1014 0.0034 0
4 tau 1.05 0.07 0
s b mu sigma tau
s 1.27e+03 0 -0.00423 (-0.030) 0.0255 (0.209) -0.315 (-0.120)
b 0 0 0 0 0
mu -0.00423 (-0.030) 0 1.53e-05 -8.41e-07 (-0.063) -1.76e-05 (-0.061)
sigma 0.0255 (0.209) 0 -8.41e-07 (-0.063) 1.17e-05 -3.03e-05 (-0.120)
tau -0.315 (-0.120) 0 -1.76e-05 (-0.061) -3.03e-05 (-0.120) 0.00543

Finally, we release all parameters and fit again to get the correct uncertainty estimates.

[19]:
m.fixed = None
m.migrad()
[19]:
Migrad
FCN = 30.11 (chi2/ndof = 0.7) Nfcn = 335
EDM = 4.12e-05 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 s 1.01e3 0.04e3 0
1 b 871 35 0
2 mu 0.997 0.004
3 sigma 0.101 0.004 0
4 tau 1.05 0.08 0
s b mu sigma tau
s 1.37e+03 -354 (-0.273) -0.00592 (-0.041) 0.0358 (0.271) -0.446 (-0.161)
b -354 (-0.273) 1.22e+03 0.00592 (0.043) -0.0358 (-0.287) 0.446 (0.170)
mu -0.00592 (-0.041) 0.00592 (0.043) 1.53e-05 -1.01e-06 (-0.072) -1.56e-05 (-0.053)
sigma 0.0358 (0.271) -0.0358 (-0.287) -1.01e-06 (-0.072) 1.27e-05 -4.33e-05 (-0.162)
tau -0.446 (-0.161) 0.446 (0.170) -1.56e-05 (-0.053) -4.33e-05 (-0.162) 0.00564
[20]:
m.visualize()
../_images/notebooks_cost_functions_37_0.svg

We get the same result as before. Since this was an easy problem, we did not need these extra steps, but doing this can be helpful to fit lots of histograms without adjusting each fit manually.

Weighted histograms

The cost functions for binned data also support weighted histograms. Just pass an array with the shape (n, 2) instead of (n,) as the first argument, where the first number of each pair is the sum of weights and the second is the sum of weights squared (an estimate of the variance of that bin value).

Least-squares fits

A cost function for a general weighted least-squares fit (aka chi-square fit) is also included. In statistics this is called non-linear regression.

In this case you need to provide a model that predicts the y-values as a function of the x-values and the parameters. The fit needs estimates of the y-errors. If those are wrong, the fit may be biased. If your data has errors on the x-values as well, checkout the tutorial about automatic differentiation, which includes an application of that to such fits.

[21]:
def model(x, a, b):
    return a + b * x ** 2

rng = np.random.default_rng(4)

truth = 1, 2
x = np.linspace(0, 1, 20)
yt = model(x, *truth)
ye = 0.4 * x**5 + 0.1
y = rng.normal(yt, ye)

plt.plot(x, yt, ls="--", label="truth")
plt.errorbar(x, y, ye, fmt="ok", label="data")
plt.legend();
../_images/notebooks_cost_functions_41_0.svg
[22]:
c = cost.LeastSquares(x, y, ye, model)
m1 = Minuit(c, a=0, b=0)
m1.migrad()
[22]:
Migrad
FCN = 25.29 (chi2/ndof = 1.4) Nfcn = 29
EDM = 2.27e-22 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 a 0.99 0.04
1 b 2.04 0.15
a b
a 0.00139 -0.00369 (-0.658)
b -0.00369 (-0.658) 0.0226
[23]:
m.visualize()
plt.plot(c.x, model(c.x, *truth), ls="--", label="truth")
[23]:
[<matplotlib.lines.Line2D at 0x7f03fd2e7d30>]
../_images/notebooks_cost_functions_43_1.svg

We can also fit a multivariate model, in this case we fit a plane in 2D.

[24]:
def model2(x_y, a, bx, by):
    x, y = x_y
    return a + bx * x + by * y

# generate a regular grid in x and y
x = np.linspace(-1, 1, 10)
y = np.linspace(-1, 1, 10)
X, Y = np.meshgrid(x, y)
x = X.flatten()
y = Y.flatten()

# model truth
Z = model2((x, y), 1, 2, 3)

# add some noise
rng = np.random.default_rng(1)
Zerr = 1
Z = rng.normal(Z, Zerr)

plt.scatter(x, y, c=Z)
plt.colorbar();
../_images/notebooks_cost_functions_45_0.svg
[25]:
c2 = cost.LeastSquares((x, y), Z, Zerr, model2)
m2 = Minuit(c2, 0, 0, 0)
m2.migrad()
[25]:
Migrad
FCN = 71.61 (chi2/ndof = 0.7) Nfcn = 34
EDM = 5.66e-16 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 a 0.93 0.10
1 bx 1.87 0.16
2 by 2.93 0.16
a bx by
a 0.01 1.29e-11 6.43e-12
bx 1.29e-11 0.0245 3.02e-11
by 6.43e-12 3.02e-11 0.0245

Multivarate fits are difficult to check by eye. Here we use color to indicate the function value.

To guarantee that plot of the function and the plot of the data use the same color scale, we use the same normalising function for pyplot.pcolormesh and pyplot.scatter.

[26]:
xm = np.linspace(-1, 1, 100)
ym = np.linspace(-1, 1, 100)
Xm, Ym = np.meshgrid(xm, ym)
xm = Xm.flatten()
ym = Ym.flatten()

qm = plt.pcolormesh(Xm, Ym, model2((xm, ym), *m2.values).reshape(Xm.shape))
plt.scatter(c2.x[0], c2.x[1], c=c2.y, edgecolors="w", norm=qm.norm)
plt.colorbar()
[26]:
<matplotlib.colorbar.Colorbar at 0x7f03fd4c2440>
../_images/notebooks_cost_functions_48_1.svg

Robust least-squares

The builtin least-squares function also supports robust fitting with an alternative loss functions. See the documentation of iminuit.cost.LeastSquares for details. Users can pass their own loss functions. Builtin loss functions are:

  • linear (default): gives ordinary weighted least-squares

  • soft_l1: quadratic ordinary loss for small deviations (\(\ll 1\sigma\)), linear loss for large deviations (\(\gg 1\sigma\)), and smooth interpolation in between

Let’s create one outlier and see what happens with ordinary loss.

[27]:
c.y[3] = 3 # generate an outlier

m3 = Minuit(c, a=0, b=0)
m3.migrad()
[27]:
Migrad
FCN = 364.9 (chi2/ndof = 20.3) Nfcn = 29
EDM = 9.85e-22 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 a 1.23 0.04
1 b 1.45 0.15
a b
a 0.00139 -0.00369 (-0.658)
b -0.00369 (-0.658) 0.0226
[28]:
m3.visualize()
plt.plot(c.x, model(c.x, 1, 2), ls="--", label="truth")
[28]:
[<matplotlib.lines.Line2D at 0x7f03fd63c670>]
../_images/notebooks_cost_functions_51_1.svg

The result is distorted by the outlier. Note that the error did not increase! The size of the error computed by Minuit does not include mismodelling.

We can repair this with by switching to “soft_l1” loss.

[29]:
c.loss = "soft_l1"
m3.migrad()
[29]:
Migrad
FCN = 54.09 (chi2/ndof = 3.0) Nfcn = 69
EDM = 4.31e-06 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 a 1.00 0.05
1 b 2.04 0.23
a b
a 0.00285 -0.00864 (-0.707)
b -0.00864 (-0.707) 0.0524
[30]:
m3.visualize()
plt.plot(c.x, model(c.x, *truth), ls="--", label="truth")
[30]:
[<matplotlib.lines.Line2D at 0x7f03fcdc7250>]
../_images/notebooks_cost_functions_54_1.svg

The result is almost identical as in the previous case without an outlier.

Robust fitting is very useful if the data are contaminated with small amounts of outliers. It comes with a price, however, the uncertainties are in general larger and the errors computed by Minuit are not correct anymore.

Calculating the parameter uncertainty properly for this case requires a so-called sandwich estimator, which is currently not implemented. As an alternative, one can use the bootstrap to compute parameter uncertaintes. We use the resample library to do this.

[31]:
from resample.bootstrap import variance as bvar

def fit(x, y, ye):
    c = cost.LeastSquares(x, y, ye, model, loss="soft_l1")
    m = Minuit(c, a=0, b=0)
    m.migrad()
    return m.values

berr = bvar(fit, c.x, c.y, c.yerror, size=1000, random_state=1) ** 0.5

fig, ax = plt.subplots(1, 2, figsize=(10, 4))
for i, axi in enumerate(ax):
    axi.errorbar(0, m1.values[i], m1.errors[i], fmt="o")
    axi.errorbar(1, m3.values[i], m3.errors[i], fmt="o")
    axi.errorbar(2, m3.values[i], berr[i], fmt="o")
    axi.set_xticks(np.arange(3), ("no outlier", "Minuit, soft_l1", "bootstrap"));
../_images/notebooks_cost_functions_56_0.svg

In this case, Minuit’s estimate is similar to the bootstrap estimate, but that is not generally true when the “soft_l1” loss is used.

Robust fits are very powerful when the outliers cannot be removed by other means. If one can identify outliers by other means, it is better to remove them. We manually remove the point (using the mask attribute) and switch back to ordinary loss.

[32]:
c.mask = np.arange(len(c.x)) != 3
c.loss = "linear"
m4 = Minuit(c, a=0, b=0)
m4.migrad()
[32]:
Migrad
FCN = 24.67 (chi2/ndof = 1.5) Nfcn = 29
EDM = 1.37e-22 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 a 0.98 0.04
1 b 2.07 0.15
a b
a 0.00158 -0.00415 (-0.678)
b -0.00415 (-0.678) 0.0238

Now the uncertainties are essentially the same as before adding the outlier.

[33]:
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
for i, axi in enumerate(ax):
    axi.errorbar(0, m1.values[i], m1.errors[i], fmt="o")
    axi.errorbar(1, m3.values[i], m3.errors[i], fmt="o")
    axi.errorbar(2, m3.values[i], berr[i], fmt="o")
    axi.errorbar(3, m4.values[i], m4.errors[i], fmt="o")
    axi.set_xticks(np.arange(4), ("no outlier", "Minuit, soft_l1", "bootstrap", "outlier removed"));
../_images/notebooks_cost_functions_60_0.svg