Study X examined the relationship between coffee consumption and lung cancer. The authors of Study X retrospectively reviewed patients' reported coffee consumption and found that drinking greater than 6 cups of coffee per day was associated with an increased risk of developing lung cancer. However, Study X was criticized by the authors of Study Y. Study Y showed that increased coffee consumption was associated with smoking. What type of bias affected Study X, and what study design is geared to reduce the chance of that bias?

Observer bias; double blind analysis

A research team has data from three completed studies on statin use and Alzheimer's disease: Study A (case-control, OR=0.6, n=500), Study B (retrospective cohort, RR=0.7, n=10,000), and Study C (RCT with cognitive decline as secondary endpoint, RR=0.9, n=2,000). The case-control study used prevalent cases, the cohort study had significant loss to follow-up in the unexposed group, and the RCT was underpowered for cognitive outcomes. Synthesize the evidence to determine the most reliable conclusion about the association.

The retrospective cohort study offers the best balance of validity and precision

The RCT provides the strongest evidence despite being underpowered

Case-control study is most reliable due to efficient rare outcome assessment

Evidence is contradictory and no conclusion can be drawn

Meta-analysis of all three studies provides the most accurate estimate

A public health department needs to determine whether a cluster of birth defects in a county is associated with industrial pollution. They have limited resources, the suspected exposure occurred 3-5 years ago, and the outcome is rare (15 cases identified). Multiple potential confounders exist including maternal age, socioeconomic status, and prenatal care access. The community demands rapid answers. Evaluate the most appropriate initial study design considering feasibility, ethics, and scientific validity.

Case-control study with multiple control groups and confounder adjustment

Prospective cohort study of pregnant women with exposure monitoring

Randomized controlled trial comparing exposed and unexposed areas

Ecologic study comparing county-level pollution and birth defect rates

Cross-sectional survey of current pollution levels and birth outcomes

Sampling methods — USMLE Step 2 CK Lesson

Sampling Fundamentals - The Right Slice

Population (N): The entire group a study aims to understand.
Sample (n): A representative subset of the population from which data is collected.
- The goal is to make inferences about the population.
Sampling Frame: The specific list of individuals from which the sample is drawn (e.g., a clinic's patient list).
Sampling Bias: A systematic error where the sample is not representative of the population, threatening the study's external validity.

Population, Sampling Method, and Sample Relationship

⭐ Generalizability (External Validity): The degree to which findings can be applied to the broader population. This is highly dependent on how well the sample represents the population.

Probability Sampling - Truly Random Acts

Ensures every member of the population has a known, non-zero chance of being selected, minimizing selection bias. Essential for generalizability (external validity).

Simple Random Sampling (SRS)
- Every individual has an equal chance of selection.
- Like a lottery; requires a full population list (sampling frame).
Systematic Sampling
- Select individuals at a regular interval (every k-th person) from a list after a random start.
- Efficient, but can be biased if the list has a periodic pattern.
Stratified Sampling
- Divide population into homogeneous subgroups (strata), e.g., by age or race.
- Perform SRS within each stratum.
- Guarantees representation of key subgroups.
Cluster Sampling
- Divide population into heterogeneous groups (clusters), e.g., hospitals or zip codes.
- Randomly select entire clusters to sample.
- 📌 Mnemonic: "Clusters are mini-populations."

⭐ Stratified sampling increases precision and ensures minority subgroups are adequately represented, boosting statistical power for subgroup analyses.

Non-Probability Sampling - Conveniently Biased

Selection isn't random; it relies on the researcher's judgment or convenience. This introduces selection bias, limiting the generalizability of findings to the broader population.

Types of Non-Probability Sampling:
- Convenience Sampling: Choosing easily accessible subjects (e.g., patients in a single clinic). Very prone to selection bias.
- Quota Sampling: Filling pre-set quotas for subgroups (e.g., 50 men, 50 women) in a non-random way.
- Purposive (Judgmental) Sampling: Researcher handpicks subjects based on specific criteria or expertise.
- Snowball Sampling: Participants recruit other eligible participants. Useful for hard-to-reach or hidden populations.

Sampling Methods: Probability vs. Non-Probability

⭐ Key limitation: Because the sample is not representative, findings from non-probability sampling cannot be generalized to the entire population. The study has low external validity.

Sampling Biases - Dodging Disasters

Selection Bias: Sample is not representative of the target population, limiting external validity.
- Ascertainment Bias: Nonrandom sampling creates a skewed sample (e.g., using only hospitalized patients).
- Nonresponse Bias: Participants differ significantly from non-participants.
- Berkson Bias: Hospital-based samples show higher disease prevalence vs. general population.
- Healthy Worker Effect: Working populations are healthier than the general population.

⭐ Neyman (Prevalence-Incidence) Bias: In case-control studies, missing severe or rapidly fatal cases leads to a non-representative sample.

High‑Yield Points - ⚡ Biggest Takeaways

Random sampling is crucial for generalizability (external validity), allowing inferences about a larger population.

Stratified sampling ensures specific subgroups are adequately represented, improving precision for those groups.

Cluster sampling randomly selects natural groups (e.g., hospital wards), offering convenience but with lower precision.

Convenience sampling is highly susceptible to selection bias, severely limiting external validity.

Random sampling minimizes selection bias; randomization in trials minimizes confounding.

Unlock the full lesson and continue reading

Signup to continue reading this lesson and unlimited access questions, flashcards, AI notes, and more

Scan to download app

UNLOCK FREE ACCESS