Theory¶

In this section, we give a bit of theory background concerning the methods used in multinorm. We give the formulae used, and a reference to where the formula and a derivation and discussion can be found.

Note

The multivariate normal distribution has very nice mathematical properties. Every derived quantity follows either again a multivariate normal distribution or a chi-squared distribution.

Marginal distribution¶

The marginal distribution can be obtained with the marginal() method.

You can think of the marginal distribution as the distribution obtained by simply ignoring some of the parameters, or by “projecting” the \(N\)-dimensional distribution onto the lower-dimensional subspace of parameters of interest.

The marginal distribution of the multivariate normal is again a multivariate normal distribution.

It can be obtained simply by keeping only the parameters of interest in the mean vector and covariance matrix (drop the parameters that are being marginalised out).

See here.

Conditional distribution¶

The conditional distribution can be obtained with the conditional() method.

The conditional distribution is given by the “slice” in the \(N\)-dimensional distribution when fixing some of the parameters.

The conditional distribution of the multivariate normal is again a multivariate normal distribution.

It can be obtained by partitioning the mean \(\mu\) and covariance \(\Sigma\) of the \(N\)-dimensional distribution into two part, corresponding to the parameters that are fixed, and that are kept free.

The formulae to obtain the mean and covariance of the conditional distribution are given here.

Fix parameters¶

This method is used e.g. in MINUIT, see Section 1.3.1 here: http://lmu.web.psi.ch/docu/manuals/software_manuals/minuit2/mnerror.pdf

As far as I can tell, it gives the same results as conditional (see test_conditional_vs_fix).

TODO: work out the math of why that is the case and document it here.

Add note that for MVN the covar matrix for conditional doesn’t depend on parameter values.

TODO: document and make example in the analyse section using iminuit.

Product distribution¶

TODO: improve this section: https://github.com/cdeil/multinorm/issues/13

We should give the full equations, the ones below are the special case for distributions without correlations.

The approximation we will use can be found in many textbooks, e.g. Section 5.6.1 stats book. given \(n\) Gaussian likelihood estimates with parameter estimates \(x_i\) and known parameter errors \(\sigma_i\):

\[p(\mu | {x_i}, {\sigma_i}),\]

if we define “weights” as inverse square of errors

\[w_i = 1 / \sigma_i^2, \sigma_i = 1 / \sqrt{w_i},\]

then the from_product maximum likelihood estimate error is given by (Equation 5.50):

\[\mu_0 = \frac{\sum{w_i x_i}}{\sum{w_i}}\]

and the from_product measurement parameter error is given by

\[w = \sum{w_i}.\]

Sigmas¶

For the one-dimensional normal distribution \(N(\mu, \sigma)\) the probability content within \(n * \sigma\) is given by roughly 68% for \(n=1\), 95% for \(n=2\) and 99.7% for \(n=3\).

What’s the equivalent for the \(N\)-dimensional normal distribution?

For a given mean \(\mu\) and covariance \(\Sigma\) and point \(p\) one can define a distance \(d\) via

\[d = \sqrt{(p - \mu)^T \Sigma^{-1} (p - \mu)}.\]

The set of equal-distance points is an ellipsoidal surface and has the property that all points on it have equal probability density. It is the equivalent of the distance \(d = (p - \mu) / \sigma\), i.e. the number of standard deviations \(d = n * \sigma\) from the mean.

However, the probability content for a given \(n * \sigma\) is lower for the \(N\)-dimensional distribution. It turns out that \(d^2\) has a \(\chi^2\) distribution with \(N\) degrees of freedom:

\[P(d^2) = \chi^2(d^2, N)\]

That means you can compute the probability content using scipy.stats.chi2 like this:

>>> import numpy as np
>>> from scipy.stats import chi2
>>> n_sigma = np.array([1, 2, 3])
>>> chi2.cdf(n_sigma ** 2, 1)
array([0.68268949, 0.95449974, 0.9973002 ])
>>> chi2.cdf(n_sigma ** 2, 2)
array([0.39346934, 0.86466472, 0.988891  ])
>>> chi2.cdf(n_sigma ** 2, 3)
array([0.19874804, 0.73853587, 0.97070911])

Note that the 1 sigma ellipse in 2D has probability content 39%, in 3D it’s only 20%, and it gets smaller and smaller for higher dimensions.

Also see https://stats.stackexchange.com/questions/331283

For further information see the Wikipedia Mahalanobis distance page.

The MultiNorm.to_matplotlib_ellipse takes an n_sigma option, and will return an ellipse that matches the points with Mahalanobis distance \(d^2 = n * \sigma\).