LOG#231. Statistical tools.

Subject today: errors. And we will review formulae to handle them with experimental data.

Errors can be generally speaking:

1st. Random. Due to imperfections of measurements or intrinsically random sources.

2st. Systematic. Due to the procedures used to measure or uncalibrated apparatus.

There is also a distinction of accuracy and precision:

1st. Accuracy is closeness to the true value of a parameter or magnitude. It is, as you keep this definition, a measure of systematic bias or error. However, sometime accuracy is defined (ISO definition) as the combination between systematic and random errors, i.e., accuracy would be the combination of the two observational errors above. High accuracy would require, in this case, higher trueness and high precision.

2nd. Precision. It is a measure of random errors. They can be reduced with further measurements and they measure statistical variability. Precision also requires repeatability and reproducibility.

1. Statistical estimators.

Arithmetic mean:

(1)   \begin{equation*}\boxed{\overline{X}=\dfrac{\displaystyle{\sum_{i=1}^n x_i}}{n}=\dfrac{\left(\mbox{Sum of measurements}\right)}{\left(\mbox{Number of measurements}\right)}}\end{equation*}

Absolute error:

(2)   \begin{equation*}\boxed{ \varepsilon_{a}=\vert x_i-\overline{x}\vert}\end{equation*}

Relative error:

(3)   \begin{equation*}\boxed{\varepsilon_r=\dfrac{\varepsilon_a}{\overline{x}}\cdot 100}\end{equation*}

Average deviation or error:

(4)   \begin{equation*}\boxed{\delta_m=\dfrac{\sum_i\vert x_i-\overline{x}\vert}{n}}\end{equation*}

Variance or average quadratic error or mean squared error:

(5)   \begin{equation*}\boxed{\sigma_x^2=s^2=\dfrac{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}{n-1}}\end{equation*}

This is the unbiased variance, when the total population is the sample, a shift must be done from n-1 to n (Bessel correction). The unbiased formula is correct as far as it is a sample from a larger population.

Standard deviation (mean squared error, mean quadratic error):

(6)   \begin{equation*}\boxed{\sigma\equiv\sqrt{\sigma_x^2}=s=\sqrt{\dfrac{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}{n-1}}}\end{equation*}

This is the unbiased estimator of the mean quadratic error, or the standard deviation of the sample. The Bessel correction is assumed whenever our sample is lesser in size that than of the total population. For total population, the standard deviation reads after shifting n-1\rightarrow n:

(7)   \begin{equation*}\boxed{\sigma_n\equiv\sqrt{\sigma_{x,n}^2}=\sqrt{\dfrac{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}{n}}=s_n}\end{equation*}

Mean error or standard error of the mean:

(8)   \begin{equation*}\boxed{\varepsilon_{\overline{x}}=\dfrac{\sigma_x}{\sqrt{n}}=\sqrt{\dfrac{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}{n\left(n-1\right)}}}\end{equation*}

If, instead of the unbiased quadratic mean error we use the total population error, the corrected standar error reads

(9)   \begin{equation*}\boxed{\varepsilon_{\overline{x},n}=\dfrac{\sigma_x}{\sqrt{n}}=\sqrt{\dfrac{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}{n^2}}=\dfrac{\sqrt{\displaystyle{\sum_{i=1}^n}\left(x_i-\overline{x}\right)^2}}{n}}\end{equation*}

Variance of the mean quadratic error (variance of the variance):

(10)   \begin{equation*}\boxed{\sigma^2\left(s^2\right)=\sigma^2_{\sigma^2}=\sigma^2\left(\sigma^2\right)=\dfrac{2\sigma^4}{n-1}}\end{equation*}

Standard error of the mean quadratic error (error of the variance):

(11)   \begin{equation*}\boxed{\sigma\left(s^2\right)=\sqrt{\sigma^2_{\sigma^2}}=\sigma\left(\sigma^2\right)=\sigma_{\sigma^2}=\sigma^2\sqrt{\dfrac{2}{n-1}}}\end{equation*}

2. Gaussian/normal distribution intervals for a given confidence level (interval width a number of entire sigmas)

Here we provide the probability of a random variable distribution X following a normal distribution to have a value inside an interval of width n\sigma.

1 sigma amplitude (1\sigma).

(12)   \begin{equation*}x\in\left[\overline{x}-\sigma,\overline{x}+\sigma\right]\longrightarrow P\approx 68.3\%\sim\dfrac{1}{3}\end{equation*}

2 sigma amplitude (2\sigma).

(13)   \begin{equation*}x\in\left[\overline{x}-2\sigma,\overline{x}+2\sigma\right]\longrightarrow P\approx 95.4\%\sim\dfrac{1}{22}\end{equation*}

3 sigma amplitude (3\sigma).

(14)   \begin{equation*}x\in\left[\overline{x}-3\sigma,\overline{x}+3\sigma\right]\longrightarrow P\approx 99.7\%\sim\dfrac{1}{370}\end{equation*}

4 sigma amplitude (4\sigma).

(15)   \begin{equation*}x\in\left[\overline{x}-4\sigma,\overline{x}+4\sigma\right]\longrightarrow P\approx 99.994\%\sim\dfrac{1}{15787}\end{equation*}

5 sigma amplitude (5\sigma).

(16)   \begin{equation*}x\in\left[\overline{x}-5\sigma,\overline{x}+5\sigma\right]\longrightarrow P\approx 99.99994\%\sim\dfrac{1}{1744278}\end{equation*}

6 sigma amplitude (6\sigma).

(17)   \begin{equation*}x\in\left[\overline{x}-6\sigma,\overline{x}+6\sigma\right]\longrightarrow P\approx 99.9999998\%\sim\dfrac{1}{506797346}\end{equation*}

3. Error propagation.

Usually, the error propagates in non direct measurements.

3A. Sum and substraction.

Let us define x\pm \delta x and y\pm \delta y. Furthermore, define the variable q=x\pm y. The error in q would be:

(18)   \begin{equation*}\boxed{\varepsilon (q)=\delta x+\delta y}\end{equation*}

Example. M_1=540\pm 10 g, M_2=940\pm 20 g. M_1=m_1+liquid, with m_1=72\pm 1g  and M_2=m_2+liquid, with m_2=97\pm 1g. Then, we have:

M=M_1-m_1+M_2-m_2=1311g as liquid mass.

\delta M=\delta M_1+\delta m_1+\delta M_2+\delta m_2=32g, as total liquid error.

M_0=1311\pm 32 g is the liquid mass and its error, together, with 3 significant digits or figures.

3B. Products and quotients (errors).

If

    \[x\pm \delta x=x\left(1\pm \dfrac{\delta x}{x}\right)\]

    \[y\pm \delta y=y\left(1\pm \dfrac{\delta x}{x}\right)\]

then, with q=xy you get

(19)   \begin{equation*}\boxed{\dfrac{\delta q}{\vert q\vert}=\dfrac{\delta x}{\vert x\vert}+\dfrac{\delta y}{\vert y\vert}=\vert y\vert\delta x+\vert x\vert\delta y}\end{equation*}

If q=x/y, you obtain essentially the same result:

(20)   \begin{equation*}\boxed{\dfrac{\delta q}{\vert q\vert}=\dfrac{\delta x}{\vert x\vert}+\dfrac{\delta y}{\vert y\vert}=\vert y\vert\delta x+\vert x\vert\delta y}\end{equation*}

3C. Error in powers.

With x\pm \delta x, q=x^n, then you derive

(21)   \begin{equation*}\dfrac{\delta q}{\vert q\vert}=\vert n\vert \dfrac{\delta x}{\vert x\vert}=\vert n\vert \vert x^{n-1}\vert \delta x\end{equation*}

and if g=f(x), with the error of x being \delta x, you get

(22)   \begin{equation*}\boxed{\delta f=\vert\dfrac{df}{dx}\vert\delta x}\end{equation*}

In the case of a several variables function, you apply a generalized Pythagorean theorem to get

(23)   \begin{equation*}\boxed{\delta q=\delta f(x_i)=\sqrt{\displaystyle{\sum_{i=1}^n}\left(\dfrac{\partial f}{\partial x_i}\delta x_i\right)^2}=\sqrt{\left(\dfrac{\partial f}{\partial x_1}\delta x_1\right)^2+\cdots+\left(\dfrac{\partial f}{\partial x_n}\delta x_n\right)^2}}\end{equation*}

or, equivalently, the errors are combined in quadrature (via standard deviations):

(24)   \begin{equation*}\boxed{\delta q=\delta f (x_1,\ldots,x_n)=\sqrt{\left(\dfrac{\partial f}{\partial x_1}\right)^2\delta^2 x_1+\cdots+\left(\dfrac{\partial f}{\partial x_n}\right)^2\delta^2 x_n}}\end{equation*}

since

(25)   \begin{equation*}\sigma (X)=\sigma (x_i)=\sqrt{\displaystyle{\sum_{i=1}^n}\sigma_i^2}=\sqrt{\sigma_1^2+\cdots+\sigma_n^2}\end{equation*}

for independent random errors (no correlations). Some simple examples are provided:

1st. q=kx, with x\pm \delta x, implies \boxed{\delta q=k\delta x}.

2nd. q=\pm x\pm y\pm \cdots, with x_i\pm \delta x_i, implies \boxed{\delta q=\delta x+\delta y+\cdots}.

3rd. q=kx_1^{\alpha_1}\cdots x_n^{\alpha_n} would imply

    \[\boxed{\dfrac{\delta q}{\vert q\vert}=\vert\alpha_1\vert\dfrac{\delta x_1}{\vert x_1\vert}+\cdots +\vert\alpha_n\vert\dfrac{\delta x_n\vert}{\vert x_n\vert}}\]

When different experiments with measurements \overline{x}_i\pm\sigma_i are provided, the best estimator for the combined mean is a weighted mean with the variance, i.e.,

(26)   \begin{equation*}\overline{X}_{best}=\dfrac{\displaystyle{\sum_{i=n}^n}\dfrac{x_i}{\sigma^2_i}}{\displaystyle{\sum_{i=1}^n}\frac{1}{\sigma^2_i}}\end{equation*}

This is also the maximal likelihood estimator of the mean assuming they are independent AND normally distributed. There, the standard error of the weighted mean would be

(27)   \begin{equation*}\sigma_{\overline{X}_{best}}=\sqrt{\dfrac{1}{\displaystyle{\sum_{i=1}^n}\dfrac{1}{\sigma^2_i}}}\end{equation*}

Least squares. Linear fits to a graph from points using least square procedure proceeds as follows. Let (X_i, Y_i) from i=1,\ldots,n be some sets of numbers from experimental data. Then, the linear function Y=AX+B that is the best fit to the data can be calculated with Y-Y_0=\overline{A}(X-X_0), where

    \[X_0=\overline{X}=\dfrac{\sum X_i}{n}\]

    \[Y_0=\overline{Y}=\dfrac{\sum Y_i}{n}\]

    \[\overline{A}=A=\dfrac{\sum (X_i-\overline{X})(Y_i-\overline{Y})}{\sum (X_i-\overline{X})^2}\]

Remark: for non homogenous samples, the best estimation of the average is not the arithmetic mean, but the median.

See you in other blog post!

View ratings
Rate this article

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.