Central Tendency - Finding the Middle
Describes the center of a dataset. Key measures are mean, median, and mode, which are affected differently by the data's distribution and outliers.
| Measure | Definition | Formula / Method | Sensitivity to Outliers |
|---|---|---|---|
| Mean | The average | $\frac{\Sigma x}{n}$ | High |
| Median | The middle value | Arrange data, find middle | Low |
| Mode | Most frequent value | Find most common value | Low |
📌 For skewed distributions, the mean is pulled towards the long tail. The median 'hides' from outliers.
⭐ The median is the most robust measure of central tendency for skewed distributions with outliers.
Dispersion - The Spread of Scores
Dispersion (or variability) measures how spread out a set of data is from its central tendency.
- Standard Deviation (SD): The average distance of data points from the mean. A larger SD signifies greater data spread.
- Variance: The square of the standard deviation ($SD^2$).
- Range: The difference between the highest and lowest values. It is simple but highly affected by outliers.
- Interquartile Range (IQR): The spread of the middle 50% of the data (Q3 − Q1); robust against outliers.
📌 The Empirical Rule for normal distributions is key: 68-95-99.7.
- ~68% of data lies within 1 SD.
- ~95% of data lies within 2 SDs.
- ~99.7% of data lies within 3 SDs.
⭐ In a normal distribution, approximately 95% of the data lies within 2 standard deviations of the mean. This is crucial for confidence intervals.

Distributions - The Shape of Data
Data distribution illustrates how data points are spread, with the relationship between central tendency measures defining its shape.

| Distribution Type | Relationship of Central Tendency |
|---|---|
| Normal (Gaussian) | $Mean = Median = Mode$ |
| Positively Skewed | $Mean > Median > Mode$ |
| Negatively Skewed | $Mean < Median < Mode$ |
⭐ In a positively skewed distribution (e.g., household income), the mean is greater than the median, which is greater than the mode. Conversely, in a negatively skewed distribution (e.g., age at death), the mean is less than the median, which is less than the mode.
High‑Yield Points - ⚡ Biggest Takeaways
- The mean is the average value but is highly sensitive to outliers.
- The median represents the middle value, making it the best measure for skewed distributions.
- The mode is the most frequently occurring value in a dataset.
- Standard deviation (SD) quantifies data dispersion; a larger SD signifies greater variability.
- In a normal distribution, ~68%, ~95%, and ~99.7% of data fall within 1, 2, and 3 SDs of the mean, respectively.
- Positive skew: Mean > Median > Mode. Negative skew: Mean < Median < Mode.
Continue reading on Oncourse
Sign up for free to access the full lesson, plus unlimited questions, flashcards, AI-powered notes, and more.
CONTINUE READING — FREEor get the app