Correlation Basics - Pattern Spotting
- Correlation: Measures strength & direction of a linear relationship between two quantitative variables.
- Pattern Spotting: Use Scatter Plots.
- Dots show relationship: direction (uphill/downhill) & strength (tight/loose cluster).
- Types of Linear Correlation:
- Positive: Variables change in same direction (X↑, Y↑). E.g., study hours & exam score.
- Negative: Variables change in opposite directions (X↑, Y↓). E.g., TV hours & exam score.
- Zero: No discernible linear pattern.

⭐ Correlation does not imply causation.
Measuring Correlation - Strength Signs
- Correlation coefficient ($r$): measures linear relationship strength & direction.
- Range: -1 (perfect negative) to +1 (perfect positive); $r$=0 means no linear correlation.
- Sign (Direction):
- Positive ($r$>0): X↑, Y↑ (direct).
- Negative ($r$<0): X↑, Y↓ (inverse).
- Magnitude ($|r|$) (Strength):
- 0.0 - 0.2: Very weak
- 0.2 - 0.4: Weak
- 0.4 - 0.7: Moderate
- 0.7 - 0.9: Strong
- 0.9 - 1.0: Very strong
- Types: Pearson's $r$ (for quantitative data), Spearman's $ρ$ (for ranked/ordinal data).
⭐ $r^2$ (Coefficient of Determination) = proportion of variance in Y explained by X.
Regression Fundamentals - Outcome Prediction
- Predicts dependent variable ($Y$) value based on independent variable ($X$).
- Simple linear regression equation: $Y = a + bX$.
- $a$: Y-intercept (value of $Y$ if $X=0$).
- $b$: Regression coefficient (slope); change in $Y$ for one unit change in $X$.
- Quantifies relationship for prediction.
- Unlike correlation (association strength), regression predicts specific values.
⭐ The sign of the regression coefficient ($b$) indicates if the relationship is positive ($b > 0$) or negative ($b < 0$).
Regression In-Depth - Lines & Limits
- Regression Equation: $Y = a + bX$
- a: Y-intercept (Y if X=0)
- b: Slope (ΔY per unit ΔX)
- Regression Coefficients (b):
- $b_{YX}$: Y on X; $b_{XY}$: X on Y
- $r = \sqrt{b_{YX} \cdot b_{XY}}$. Signs of r, $b_{YX}$, $b_{XY}$ are same.
- Coefficient of Determination ($R^2$ or $r^2$):
- Proportion of Y's variance explained by X.
- Values: 0-1. E.g., $r=0.8 \implies R^2=0.64$ (64% variance explained).
- Assumptions (L.I.N.E.) 📌:
- Linearity, Independence (errors), Normality (errors), Equal variance (Homoscedasticity).
⭐ The two regression lines intersect at the mean of X and the mean of Y.
Correlation vs. Regression - Compare & Contrast
| Feature | Correlation | Regression |
|---|---|---|
| Purpose | Strength, direction of linear association | Predicts Y (dependent) from X (independent) |
| Variables | X, Y both random variables | Y random; X may be fixed/random |
| Output | $r$ (coefficient); range -1 to +1 | Equation: $Y = a + bX$; $R^2$ (coeff. of determination) |
| Symmetry | Symmetric: $r(X,Y) = r(Y,X)$ | Asymmetric: $b(Y \text{ on } X) \neq b(X \text{ on } Y)$ |
| Causation | No direct causation implied | May suggest, not prove, causation |
| Focus | Degree of association | Nature of relationship & prediction |
High‑Yield Points - ⚡ Biggest Takeaways
- Correlation coefficient (r) measures strength & direction of a linear relationship (-1 to +1).
- Coefficient of determination (r²) is the proportion of variance in one variable (dependent) that is predictable from the other variable (independent).
- Regression analysis describes the mathematical relationship between variables for prediction (e.g., Y = a + bX).
- The regression coefficient ('b') or slope indicates the change in the dependent variable for a one-unit change in the independent variable.
- Differentiate Pearson's correlation (for linear relationships, quantitative, normally distributed data) from Spearman's rank correlation (for ordinal data or non-linear monotonic relationships).
Continue reading on Oncourse
Sign up for free to access the full lesson, plus unlimited questions, flashcards, AI-powered notes, and more.
CONTINUE READING — FREEor get the app