# LOG#231. Statistical tools.

Subject today: errors. And we will review formulae to handle them with experimental data.

Errors can be generally speaking:

1st. Random. Due to imperfections of measurements or intrinsically random sources.

2st. Systematic. Due to the procedures used to measure or uncalibrated apparatus.

There is also a distinction of accuracy and precision:

1st. Accuracy is closeness to the true value of a parameter or magnitude. It is, as you keep this definition, a measure of systematic bias or error. However, sometime accuracy is defined (ISO definition) as the combination between systematic and random errors, i.e., accuracy would be the combination of the two observational errors above. High accuracy would require, in this case, higher trueness and high precision.

2nd. Precision. It is a measure of random errors. They can be reduced with further measurements and they measure statistical variability. Precision also requires repeatability and reproducibility.

1. Statistical estimators.

Arithmetic mean:

(1)

Absolute error:

(2)

Relative error:

(3)

Average deviation or error:

(4)

Variance or average quadratic error or mean squared error:

(5)

This is the unbiased variance, when the total population is the sample, a shift must be done from to (Bessel correction). The unbiased formula is correct as far as it is a sample from a larger population.

Standard deviation (mean squared error, mean quadratic error):

(6)

This is the unbiased estimator of the mean quadratic error, or the standard deviation of the sample. The Bessel correction is assumed whenever our sample is lesser in size that than of the total population. For total population, the standard deviation reads after shifting :

(7)

Mean error or standard error of the mean:

(8)

If, instead of the unbiased quadratic mean error we use the total population error, the corrected standar error reads

(9)

Variance of the mean quadratic error (variance of the variance):

(10)

Standard error of the mean quadratic error (error of the variance):

(11)

2. Gaussian/normal distribution intervals for a given confidence level (interval width a number of entire sigmas)

Here we provide the probability of a random variable distribution X following a normal distribution to have a value inside an interval of width .

1 sigma amplitude ().

(12)

2 sigma amplitude ().

(13)

3 sigma amplitude ().

(14)

4 sigma amplitude ().

(15)

5 sigma amplitude ().

(16)

6 sigma amplitude ().

(17)

3. Error propagation.

Usually, the error propagates in non direct measurements.

3A. Sum and substraction.

Let us define and . Furthermore, define the variable . The error in would be:

(18)

Example. , . , with   and , with . Then, we have:

as liquid mass.

, as total liquid error.

is the liquid mass and its error, together, with 3 significant digits or figures.

3B. Products and quotients (errors).

If

then, with you get

(19)

If , you obtain essentially the same result:

(20)

3C. Error in powers.

With , , then you derive

(21)

and if , with the error of being , you get

(22)

In the case of a several variables function, you apply a generalized Pythagorean theorem to get

(23)

or, equivalently, the errors are combined in quadrature (via standard deviations):

(24)

since

(25)

for independent random errors (no correlations). Some simple examples are provided:

1st. , with , implies .

2nd. , with , implies .

3rd. would imply

When different experiments with measurements are provided, the best estimator for the combined mean is a weighted mean with the variance, i.e.,

(26)

The best standard deviation from the different combined measurements would be:

(27)

This is also the maximal likelihood estimator of the mean assuming they are independent AND normally distributed. There, the standard error of the weighted mean would be

(28)

Least squares. Linear fits to a graph from points using least square procedure proceeds as follows. Let from be some sets of numbers from experimental data. Then, the linear function that is the best fit to the data can be calculated with , where

Remark: for non homogenous samples, the best estimation of the average is not the arithmetic mean, but the median.

See you in other blog post!

View ratings