Section 1.3 Characterizing a Set of Measurements: Numerical Methods
The quantities we define are numerical descriptive measures of a set of data. We seek some numbers that have meaningful interpretations and that can be used to describe the frequency distribution for any set of measurements. We will confine our attention to two types of descriptive numbers: measures of central tendency and measures of dispersion or variation.
Definition 1.3.1.
The mean of a sample of \(n\) measured responses \(y_1, y_2, ..., y_n\) is given by
\begin{equation*}
\overline{y} = \frac{1}{n} \sum_{i = 1}^n y_i\text{.}
\end{equation*}
The corresponding population mean is denoted \(\mu\text{.}\)
Definition 1.3.2.
The variance of a sample of measurements \(y_1, y_2, ..., y_n\) is the sum of the square of the differences between the measurements and their mean, divided by \(n - 1\text{.}\) Symbolically, the sample variance is
\begin{equation*}
s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (y_i - \overline{y})^2\text{.}
\end{equation*}
The corresponding population variance is denoted by the symbol \(\sigma^2\text{.}\)
Definition 1.3.3.
The standard deviation of a sample of measurements is the positive square root of the variance; that is,
\begin{equation*}
s = \sqrt{s^2}\text{.}
\end{equation*}
The corresponding population standard deviation is denoted by \(\sigma = \sqrt{\sigma^2}\text{.}\)
Exercise 1.17 defines the
range of a set of measurements as the difference between the largest value and the smallest value. It then states that the empirical rule suggests that the standard deviation of a set of measurements may be roughly approximated by one-fourth of the range, i.e.,
\(s \approx\) range/4. Later on, Exercise 1.26 (at the end of
Section 1.6) states that the greater the amount of data, the greater will be their tendency to contain a few extreme values that will inflate the range and have relatively little effect on
\(s\text{,}\) the sample standard deviation, and the authors ignored this phenomenon by suggesting using 4 in the ratio for finding a good approximation of
\(s\) in range/4 because
\begin{equation*}
\text{range} = \text{max} - \text{min} \approx (\overline{y} + 2s) - (\overline{y} - 2s) = 4s \implies \text{range}/4 \approx s \text{.}
\end{equation*}