P-value controversy and alternatives

On this page

The P-Value Problem - Heresy or Hype?

  • Core Issues with P-values:

    • The p < 0.05 threshold is arbitrary and can lead to "p-hacking" (manipulating data to achieve significance).
    • They don't measure effect size or clinical importance; a tiny, trivial effect can be statistically significant with a large sample size.
    • A non-significant result (e.g., p = 0.06) doesn't prove the null hypothesis is true.
  • Key Alternatives & Complements:

    • Confidence Intervals (CIs): Provide a range of plausible values for the true effect size, giving a better sense of clinical meaning.
    • Effect Sizes: (e.g., Cohen's d, Odds Ratio) directly quantify the magnitude of an observed phenomenon.
    • Bayesian Statistics: Updates the probability of a hypothesis as more evidence becomes available.

⭐ The American Statistical Association (ASA) advises against making scientific conclusions or policy decisions based solely on whether a p-value crosses a specific threshold. Context and effect size are critical.

Confidence Intervals - The Better Story

CIs provide a more complete picture than a p-value by estimating a range of plausible values for the true population effect.

  • What It Is: A range of values that likely contains the true population parameter. A 95% CI means we are 95% confident the true value lies within this range.

  • Precision & Effect Size:

    • Narrow CI → High precision (more certain estimate).
    • Wide CI → Low precision (less certain, often from small samples).
    • The CI's position and width help assess both statistical and clinical significance.

⭐ If a 95% CI for a mean difference includes 0, or for a ratio (e.g., RR, OR) includes 1, the result is not statistically significant at the p < 0.05 level.

Confidence intervals for average TV hours in UK vs. USA

Bayesian Alternatives - A Different Logic

  • Shifts focus from the p-value's $P(\text{data}|\text{hypothesis})$ to the more intuitive question: $P(\text{hypothesis}|\text{data})$.
  • Combines prior beliefs ($P(\text{Hypothesis})$) with study data (likelihood) to generate an updated posterior probability.
    • Bayes' Theorem: $P(H|D) = \frac{P(D|H) \times P(H)}{P(D)}$

Bayesian vs. Frequentist Inference Comparison

  • Interpretation: Avoids the binary "significant/non-significant" threshold.
    • Results are often given as a Bayes Factor (BF), which quantifies the strength of evidence for one hypothesis over another.
    • A BF of 5 means the data are 5 times more likely under the alternative hypothesis than the null.

⭐ A key advantage is the ability to provide evidence for the null hypothesis (e.g., confirming no effect), which a p-value can never do-it can only fail to reject it.

Effect Sizes & Pre-Registration - Practical Steps

  • Effect Size: Measures the magnitude of an intervention's effect, moving beyond the p-value's binary significance. It quantifies the clinical importance of a finding.

    • Examples: Cohen's d, odds ratio (OR), relative risk (RR).
    • A small p-value with a small effect size may be statistically significant but clinically meaningless.
  • Pre-registration: Publicly registering your study's hypothesis and analysis plan before data collection.

    • Goal: Increases transparency and combats publication bias and p-hacking.

⭐ Pre-registering a study protocol on platforms like ClinicalTrials.gov is often a prerequisite for publication in major medical journals, enhancing research credibility.

High-Yield Points - ⚡ Biggest Takeaways

  • P-values are often misinterpreted as the probability the null hypothesis is true.
  • They don't measure the effect size or clinical importance.
  • A non-significant p-value isn't proof for the null hypothesis; it could just be low power.
  • P-hacking (data dredging) leads to spurious findings and inflated Type I error.
  • Confidence intervals are preferred; they provide the effect size and precision.
  • Bayesian methods offer an alternative by directly testing the hypothesis probability.

Practice Questions: P-value controversy and alternatives

Test your understanding with these related questions

Group of 100 medical students took an end of the year exam. The mean score on the exam was 70%, with a standard deviation of 25%. The professor states that a student's score must be within the 95% confidence interval of the mean to pass the exam. Which of the following is the minimum score a student can have to pass the exam?

1 of 5

Flashcards: P-value controversy and alternatives

1/10

_____ are a range of values within which the true mean of the population is expected to fall, with a specified probability (usually 95%)

TAP TO REVEAL ANSWER

_____ are a range of values within which the true mean of the population is expected to fall, with a specified probability (usually 95%)

Confidence intervals

browseSpaceflip

Enjoying this lesson?

Get full access to all lessons, practice questions, and more.

Start Your Free Trial