equationzone

Statistics

Descriptive Statistics

Aprende estadística descriptiva con fórmulas esenciales para datos agrupados y no agrupados. Incluye medidas de tendencia central, dispersión, y ejemplos prácticos para análisis de datos.

Symbols and Notation

nn: Total number of observations

mm: Number of classes (for grouped data)

xix_i: i-th observation (ungrouped data)

fif_i: Absolute frequency of the i-th class

FiF_i: Cumulative frequency up to the i-th class

LinfL_{inf}: Lower limit of a class

cic_i: Class width (length)

X\overline{X}: Arithmetic Mean

X~: Median

MoMo: Mode

X=1ni=1nxi \overline{X} = \frac{1}{n} \sum_{i=1}^{n} x_i

X~={x(n+12)if n is oddx(n2)+x(n2+1)2if n is even \tilde{X} = \begin{cases} x_{\left(\frac{n+1}{2}\right)} & \text{if } n \text{ is odd} \\[10pt] \dfrac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2} & \text{if } n \text{ is even} \end{cases}

Mo=value(s) with the highest absolute frequency Mo = \text{value(s) with the highest absolute frequency}

R=xmaxxmin R = x_{\max} - x_{\min}

S2=1ni=1n(xiX)2 S^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^2

Practical formula:

S2=i=1nxi2nX2 S^2 = \frac{\sum_{i=1}^{n} x_i^2}{n} - \overline{X}^2

S=S2 S = \sqrt{S^2}

CV=SX CV = \frac{S}{\overline{X}}

α3=1ni=1n(xiX)3S3 \alpha_3 = \frac{\frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^3}{S^3}

  • α3=0\alpha_3 = 0: Symmetric distribution
  • α3>0\alpha_3 > 0: Positive skew (tail to the right)
  • α3<0\alpha_3 < 0: Negative skew (tail to the left)

α4=1ni=1n(xiX)4S4 \alpha_4 = \frac{\frac{1}{n} \sum_{i=1}^{n} (x_i - \overline{X})^4}{S^4}

  • α4=3\alpha_4 = 3: Mesokurtic distribution (like the normal)
  • α4>3\alpha_4 > 3: Leptokurtic (more peaked)
  • α4<3\alpha_4 < 3: Platykurtic (less peaked)

For Grouped Data (Frequency Tables)

X=1ni=1mfixi \overline{X} = \frac{1}{n} \sum_{i=1}^{m} f_i \cdot x_i

where xix_i is the class mark (midpoint).

X~=Lmed+[n2Fi1fmed]cmed \tilde{X} = L_{\text{med}} + \left[ \frac{\frac{n}{2} - F_{i-1}}{f_{\text{med}}} \right] \cdot c_{\text{med}}

  • LmedL_{\text{med}}: Lower limit of the median class
  • fmedf_{\text{med}}: Frequency of the median class
  • Fi1F_{i-1}: Cumulative frequency preceding the median class
  • cmedc_{\text{med}}: Width of the median class

Mo=LMo+[Δ1Δ1+Δ2]cMo Mo = L_{\text{Mo}} + \left[ \frac{\Delta_1}{\Delta_1 + \Delta_2} \right] \cdot c_{\text{Mo}}

  • Δ1=fMofMo1\Delta_1 = f_{\text{Mo}} - f_{\text{Mo}-1}
  • Δ2=fMofMo+1\Delta_2 = f_{\text{Mo}} - f_{\text{Mo}+1}
  • LMoL_{\text{Mo}}: Lower limit of the modal class

R=Lsup, lastLinf, first R = L_{\text{sup, last}} - L_{\text{inf, first}}

S2=1ni=1mfi(xiX)2=i=1mfixi2nX2 S^2 = \frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^2 = \frac{\sum_{i=1}^{m} f_i x_i^2}{n} - \overline{X}^2

S=S2 S = \sqrt{S^2}

CV=SX CV = \frac{S}{\overline{X}}

α3=1ni=1mfi(xiX)3S3 \alpha_3 = \frac{\frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^3}{S^3}

α4=1ni=1mfi(xiX)4S4 \alpha_4 = \frac{\frac{1}{n} \sum_{i=1}^{m} f_i (x_i - \overline{X})^4}{S^4}


Fractiles (Quantiles) for Grouped Data

Pk=Lk+[nkFi1fk]ck P_k = L_k + \left[ \frac{n \cdot k - F_{i-1}}{f_k} \right] \cdot c_k

where:

  • PkP_k: Desired fractile
  • LkL_k: Lower limit of the fractile class
  • kk: Corresponding proportion (e.g., 0.25 for Q1Q_1)
  • fkf_k: Frequency of the fractile class
  • Fi1F_{i-1}: Preceding cumulative frequency
  • ckc_k: Width of the fractile class
Type Symbol Proportion (k) Example
Quartiles Q1,Q2,Q3Q_1, Q_2, Q_3 0.25, 0.50, 0.75 Q3Q_3: k=0.75k = 0.75
Deciles D1,D2,,D9D_1, D_2, \ldots, D_9 0.10, 0.20, \ldots, 0.90 D5D_5 = Median
Percentiles P1,P2,,P99P_1, P_2, \ldots, P_{99} 0.01, 0.02, \ldots, 0.99 P90P_{90}: k=0.90k = 0.90

Important relationships:

  • Q2=D5=P50Q_2 = D_5 = P_{50} = Median
  • D1=P10D_1 = P_{10}, D9=P90D_9 = P_{90}

  1. Grouped data: All formulas use the class mark (midpoint) xix_i as the representative value of the interval.

  2. Median and fractiles: Their calculation first requires identifying the corresponding class by analyzing cumulative frequencies.

  3. Mode: A distribution can be unimodal, bimodal, or multimodal. For grouped data, interpolation within the modal class is used.

  4. Interpreting the coefficient of variation:

    • CV<15%CV < 15\%: Low relative dispersion
    • 15%CV30%15\% \leq CV \leq 30\%: Moderate dispersion
    • CV>30%CV > 30\%: High relative dispersion
  5. Units of measurement:

    • Variance retains the original units squared
    • Standard deviation maintains the original units
    • The coefficients of variation, skewness, and kurtosis are dimensionless