Descriptive Statistics

Symbols and Notation

$n$ : Total number of observations

$m$ : Number of classes (for grouped data)

$x_i$ : i-th observation (ungrouped data)

$f_i$ : Absolute frequency of the i-th class

$F_i$ : Cumulative frequency up to the i-th class

$L_{inf}$ : Lower limit of a class

$c_i$ : Class width (length)

$\overline{X}$ : Arithmetic Mean

$X̃$ : Median

$Mo$ : Mode

For Ungrouped Data

Measures of Central Tendency

Arithmetic Mean

\overline{X} = \frac{1}{n} \sum_{i=1}^{n} x_i

Median

\tilde{X} = \begin{cases} x_{\left(\frac{n+1}{2}\right)} & \text{if } n \text{ is odd} \\[10pt] \dfrac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2} & \text{if } n \text{ is even} \end{cases}

Mode

Mo = \text{value(s) with the highest absolute frequency}

Measures of Dispersion

Range

R = x_{\max} - x_{\min}

Variance (population)

S^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^2

Practical formula:

S^2 = \frac{\sum_{i=1}^{n} x_i^2}{n} - \overline{X}^2

Standard Deviation

S = \sqrt{S^2}

Coefficient of Variation

CV = \frac{S}{\overline{X}}

Measures of Shape

Skewness Coefficient

\alpha_3 = \frac{\frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^3}{S^3}

$\alpha_3 = 0$ : Symmetric distribution
$\alpha_3 > 0$ : Positive skew (tail to the right)
$\alpha_3 < 0$ : Negative skew (tail to the left)

Kurtosis Coefficient

\alpha_4 = \frac{\frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^4}{S^4}

$\alpha_4 = 3$ : Mesokurtic distribution (like the normal)
$\alpha_4 > 3$ : Leptokurtic (more peaked)
$\alpha_4 < 3$ : Platykurtic (less peaked)

For Grouped Data (Frequency Tables)

Measures of Central Tendency

Arithmetic Mean

\overline{X} = \frac{1}{n} \sum_{i=1}^{m} f_i \cdot x_i

where $x_i$ is the class mark (midpoint).

Median

\tilde{X} = L_{\text{med}} + \left[ \frac{\frac{n}{2} - F_{i-1}}{f_{\text{med}}} \right] \cdot c_{\text{med}}

$L_{\text{med}}$ : Lower limit of the median class
$f_{\text{med}}$ : Frequency of the median class
$F_{i-1}$ : Cumulative frequency preceding the median class
$c_{\text{med}}$ : Width of the median class

Mode

Mo = L_{\text{Mo}} + \left[ \frac{\Delta_1}{\Delta_1 + \Delta_2} \right] \cdot c_{\text{Mo}}

$\Delta_1 = f_{\text{Mo}} - f_{\text{Mo}-1}$
$\Delta_2 = f_{\text{Mo}} - f_{\text{Mo}+1}$
$L_{\text{Mo}}$ : Lower limit of the modal class

Measures of Dispersion

Range

R = L_{\text{sup, last}} - L_{\text{inf, first}}

Variance

S^2 = \frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^2 = \frac{\sum_{i=1}^{m} f_i x_i^2}{n} - \overline{X}^2

Standard Deviation

S = \sqrt{S^2}

Coefficient of Variation

CV = \frac{S}{\overline{X}}

Measures of Shape

Skewness Coefficient

\alpha_3 = \frac{\frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^3}{S^3}

Kurtosis Coefficient

\alpha_4 = \frac{\frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^4}{S^4}

Fractiles (Quantiles) for Grouped Data

General Formula

P_k = L_k + \left[ \frac{n \cdot k - F_{i-1}}{f_k} \right] \cdot c_k

where:

$P_k$ : Desired fractile
$L_k$ : Lower limit of the fractile class
$k$ : Corresponding proportion (e.g., 0.25 for $Q_1$ )
$f_k$ : Frequency of the fractile class
$F_{i-1}$ : Preceding cumulative frequency
$c_k$ : Width of the fractile class

Types of Fractiles

Type	Symbol	Proportion (k)	Example
Quartiles	$Q_1, Q_2, Q_3$	0.25, 0.50, 0.75	$Q_3$ : $k = 0.75$
Deciles	$D_1, D_2, \ldots, D_9$	0.10, 0.20, $\ldots$ , 0.90	$D_5$ = Median
Percentiles	$P_1, P_2, \ldots, P_{99}$	0.01, 0.02, $\ldots$ , 0.99	$P_{90}$ : $k = 0.90$

Important relationships:

$Q_2 = D_5 = P_{50}$ = Median
$D_1 = P_{10}$ , $D_9 = P_{90}$

Practical Considerations

Grouped data: All formulas use the class mark (midpoint) $x_i$ as the representative value of the interval.
Median and fractiles: Their calculation first requires identifying the corresponding class by analyzing cumulative frequencies.
Mode: A distribution can be unimodal, bimodal, or multimodal. For grouped data, interpolation within the modal class is used.
Interpreting the coefficient of variation:
- $CV < 15\%$ : Low relative dispersion
- $15\% \leq CV \leq 30\%$ : Moderate dispersion
- $CV > 30\%$ : High relative dispersion
Units of measurement:
- Variance retains the original units squared
- Standard deviation maintains the original units
- The coefficients of variation, skewness, and kurtosis are dimensionless