Correlation and Regression

On this page

Correlation Basics - Pattern Spotting

  • Correlation: Measures strength & direction of a linear relationship between two quantitative variables.
  • Pattern Spotting: Use Scatter Plots.
    • Dots show relationship: direction (uphill/downhill) & strength (tight/loose cluster).
  • Types of Linear Correlation:
    • Positive: Variables change in same direction (X↑, Y↑). E.g., study hours & exam score.
    • Negative: Variables change in opposite directions (X↑, Y↓). E.g., TV hours & exam score.
    • Zero: No discernible linear pattern. Scatter plots: positive, negative, no correlation

⭐ Correlation does not imply causation.

Measuring Correlation - Strength Signs

  • Correlation coefficient ($r$): measures linear relationship strength & direction.
  • Range: -1 (perfect negative) to +1 (perfect positive); $r$=0 means no linear correlation.
  • Sign (Direction):
    • Positive ($r$>0): X↑, Y↑ (direct).
    • Negative ($r$<0): X↑, Y↓ (inverse).
  • Magnitude ($|r|$) (Strength):
    • 0.0 - 0.2: Very weak
    • 0.2 - 0.4: Weak
    • 0.4 - 0.7: Moderate
    • 0.7 - 0.9: Strong
    • 0.9 - 1.0: Very strong
  • Types: Pearson's $r$ (for quantitative data), Spearman's $ρ$ (for ranked/ordinal data).

⭐ $r^2$ (Coefficient of Determination) = proportion of variance in Y explained by X.

Regression Fundamentals - Outcome Prediction

  • Predicts dependent variable ($Y$) value based on independent variable ($X$).
  • Simple linear regression equation: $Y = a + bX$.
    • $a$: Y-intercept (value of $Y$ if $X=0$).
    • $b$: Regression coefficient (slope); change in $Y$ for one unit change in $X$.
  • Quantifies relationship for prediction.
  • Unlike correlation (association strength), regression predicts specific values.

⭐ The sign of the regression coefficient ($b$) indicates if the relationship is positive ($b > 0$) or negative ($b < 0$).

Regression In-Depth - Lines & Limits

  • Regression Equation: $Y = a + bX$
    • a: Y-intercept (Y if X=0)
    • b: Slope (ΔY per unit ΔX)
  • Regression Coefficients (b):
    • $b_{YX}$: Y on X; $b_{XY}$: X on Y
    • $r = \sqrt{b_{YX} \cdot b_{XY}}$. Signs of r, $b_{YX}$, $b_{XY}$ are same.
  • Coefficient of Determination ($R^2$ or $r^2$):
    • Proportion of Y's variance explained by X.
    • Values: 0-1. E.g., $r=0.8 \implies R^2=0.64$ (64% variance explained).
  • Assumptions (L.I.N.E.) 📌:
    • Linearity, Independence (errors), Normality (errors), Equal variance (Homoscedasticity).

⭐ The two regression lines intersect at the mean of X and the mean of Y.

Correlation vs. Regression - Compare & Contrast

FeatureCorrelationRegression
PurposeStrength, direction of linear associationPredicts Y (dependent) from X (independent)
VariablesX, Y both random variablesY random; X may be fixed/random
Output$r$ (coefficient); range -1 to +1Equation: $Y = a + bX$; $R^2$ (coeff. of determination)
SymmetrySymmetric: $r(X,Y) = r(Y,X)$Asymmetric: $b(Y \text{ on } X) \neq b(X \text{ on } Y)$
CausationNo direct causation impliedMay suggest, not prove, causation
FocusDegree of associationNature of relationship & prediction

High‑Yield Points - ⚡ Biggest Takeaways

  • Correlation coefficient (r) measures strength & direction of a linear relationship (-1 to +1).
  • Coefficient of determination (r²) is the proportion of variance in one variable (dependent) that is predictable from the other variable (independent).
  • Regression analysis describes the mathematical relationship between variables for prediction (e.g., Y = a + bX).
  • The regression coefficient ('b') or slope indicates the change in the dependent variable for a one-unit change in the independent variable.
  • Differentiate Pearson's correlation (for linear relationships, quantitative, normally distributed data) from Spearman's rank correlation (for ordinal data or non-linear monotonic relationships).

Practice Questions: Correlation and Regression

Test your understanding with these related questions

A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

1 of 5

Flashcards: Correlation and Regression

1/10

Values such as rank orders, or mild-moderate-severe represent _____ scale data

TAP TO REVEAL ANSWER

Values such as rank orders, or mild-moderate-severe represent _____ scale data

ordinal

browseSpaceflip

Enjoying this lesson?

Get full access to all lessons, practice questions, and more.

Start Your Free Trial