Skip to main content

Section 1.3 Characterizing a Set of Measurements: Numerical Methods

The quantities we define are numerical descriptive measures of a set of data. We seek some numbers that have meaningful interpretations and that can be used to describe the frequency distribution for any set of measurements. We will confine our attention to two types of descriptive numbers: measures of central tendency and measures of dispersion or variation.

Definition 1.3.1.

The mean of a sample of \(n\) measured responses \(y_1, y_2, ..., y_n\) is given by
\begin{equation*} \overline{y} = \frac{1}{n} \sum_{i = 1}^n y_i\text{.} \end{equation*}
The corresponding population mean is denoted \(\mu\text{.}\)

Definition 1.3.2.

The variance of a sample of measurements \(y_1, y_2, ..., y_n\) is the sum of the square of the differences between the measurements and their mean, divided by \(n - 1\text{.}\) Symbolically, the sample variance is
\begin{equation*} s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (y_i - \overline{y})^2\text{.} \end{equation*}
The corresponding population variance is denoted by the symbol \(\sigma^2\text{.}\)

Definition 1.3.3.

The standard deviation of a sample of measurements is the positive square root of the variance; that is,
\begin{equation*} s = \sqrt{s^2}\text{.} \end{equation*}
The corresponding population standard deviation is denoted by \(\sigma = \sqrt{\sigma^2}\text{.}\)

Remark 1.3.4. Empirical Rule.

For a distribution of measurements that is approximately normal (bell-shaped), it follows that the interval with endpoints
\(\mu \pm \sigma\) contains approximately 68% of the measurements.
\(\mu \pm 2 \sigma\) contains approximately 95% of the measurements.
\(\mu \pm 3 \sigma\) contains almost all of the measurements.
Exercise 1.17 defines the range of a set of measurements as the difference between the largest value and the smallest value. It then states that the empirical rule suggests that the standard deviation of a set of measurements may be roughly approximated by one-fourth of the range, i.e., \(s \approx\) range/4. Later on, Exercise 1.26 (at the end of Section 1.6) states that the greater the amount of data, the greater will be their tendency to contain a few extreme values that will inflate the range and have relatively little effect on \(s\text{,}\) the sample standard deviation, and the authors ignored this phenomenon by suggesting using 4 in the ratio for finding a good approximation of \(s\) in range/4 because
\begin{equation*} \text{range} = \text{max} - \text{min} \approx (\overline{y} + 2s) - (\overline{y} - 2s) = 4s \implies \text{range}/4 \approx s \text{.} \end{equation*}

Exercises Exercises