Limited time75% off all plans
Get the app

Descriptive statistics (central tendency, dispersion)

Descriptive statistics (central tendency, dispersion)

Descriptive statistics (central tendency, dispersion)

On this page

Central Tendency - Finding the Middle

Describes the center of a dataset. Key measures are mean, median, and mode, which are affected differently by the data's distribution and outliers.

MeasureDefinitionFormula / MethodSensitivity to Outliers
MeanThe average$\frac{\Sigma x}{n}$High
MedianThe middle valueArrange data, find middleLow
ModeMost frequent valueFind most common valueLow

📌 For skewed distributions, the mean is pulled towards the long tail. The median 'hides' from outliers.

⭐ The median is the most robust measure of central tendency for skewed distributions with outliers.

Dispersion - The Spread of Scores

Dispersion (or variability) measures how spread out a set of data is from its central tendency.

  • Standard Deviation (SD): The average distance of data points from the mean. A larger SD signifies greater data spread.
  • Variance: The square of the standard deviation ($SD^2$).
  • Range: The difference between the highest and lowest values. It is simple but highly affected by outliers.
  • Interquartile Range (IQR): The spread of the middle 50% of the data (Q3 − Q1); robust against outliers.

📌 The Empirical Rule for normal distributions is key: 68-95-99.7.

  • ~68% of data lies within 1 SD.
  • ~95% of data lies within 2 SDs.
  • ~99.7% of data lies within 3 SDs.

⭐ In a normal distribution, approximately 95% of the data lies within 2 standard deviations of the mean. This is crucial for confidence intervals.

Normal Distribution: 68-95-99.7 Rule

Distributions - The Shape of Data

Data distribution illustrates how data points are spread, with the relationship between central tendency measures defining its shape.

Normal, Positively, and Negatively Skewed Distributions

Distribution TypeRelationship of Central Tendency
Normal (Gaussian)$Mean = Median = Mode$
Positively Skewed$Mean > Median > Mode$
Negatively Skewed$Mean < Median < Mode$

⭐ In a positively skewed distribution (e.g., household income), the mean is greater than the median, which is greater than the mode. Conversely, in a negatively skewed distribution (e.g., age at death), the mean is less than the median, which is less than the mode.

High‑Yield Points - ⚡ Biggest Takeaways

  • The mean is the average value but is highly sensitive to outliers.
  • The median represents the middle value, making it the best measure for skewed distributions.
  • The mode is the most frequently occurring value in a dataset.
  • Standard deviation (SD) quantifies data dispersion; a larger SD signifies greater variability.
  • In a normal distribution, ~68%, ~95%, and ~99.7% of data fall within 1, 2, and 3 SDs of the mean, respectively.
  • Positive skew: Mean > Median > Mode. Negative skew: Mean < Median < Mode.

Continue reading on Oncourse

Sign up for free to access the full lesson, plus unlimited questions, flashcards, AI-powered notes, and more.

CONTINUE READING — FREE

or get the app

Rezzy — Oncourse's AI Study Mate

Have doubts about this lesson?

Ask Rezzy, your AI Study Mate, to explain anything you didn't understand

Enjoying this lesson?

Get full access to all lessons, practice questions, and more.

START FOR FREE