A research team has data from three completed studies on statin use and Alzheimer's disease: Study A (case-control, OR=0.6, n=500), Study B (retrospective cohort, RR=0.7, n=10,000), and Study C (RCT with cognitive decline as secondary endpoint, RR=0.9, n=2,000). The case-control study used prevalent cases, the cohort study had significant loss to follow-up in the unexposed group, and the RCT was underpowered for cognitive outcomes. Synthesize the evidence to determine the most reliable conclusion about the association.
Q2
A public health department needs to determine whether a cluster of birth defects in a county is associated with industrial pollution. They have limited resources, the suspected exposure occurred 3-5 years ago, and the outcome is rare (15 cases identified). Multiple potential confounders exist including maternal age, socioeconomic status, and prenatal care access. The community demands rapid answers. Evaluate the most appropriate initial study design considering feasibility, ethics, and scientific validity.
Q3
A pharmaceutical company wants to evaluate a new anticoagulant's effectiveness in preventing stroke in atrial fibrillation patients. They have limited funding and need results within 2 years. The drug has promising phase 2 data. Concurrent medications and comorbidities vary widely in the target population. The company must choose between a pragmatic trial in 50 community hospitals or an explanatory trial at 3 academic centers with strict protocols. Evaluate which design best serves both scientific and practical objectives.
Q4
A randomized controlled trial of a new diabetes medication shows significant reduction in HbA1c levels (p<0.001). However, 40% of participants in the treatment group and 15% in the placebo group dropped out before study completion, primarily due to gastrointestinal side effects in the treatment group. The analysis includes only participants who completed the study. Analyze the impact on study conclusions.
Q5
A study follows 5,000 healthcare workers from 1995 to 2015, tracking their exposure to bloodborne pathogens and development of chronic infections. The researchers use employment records to determine exposure history rather than relying on participant recall. In 2010, the hospital implemented a new electronic medical record system that improved documentation of needle-stick injuries. Analyze how this change affects the validity of the exposure assessment.
Q6
A researcher studies 300 patients with pancreatic cancer and 300 matched controls, collecting information about coffee consumption over their lifetime. The odds ratio for pancreatic cancer in heavy coffee drinkers is 3.2 (95% CI: 2.1-4.8). However, the researcher later discovers that cases were more likely to accurately recall their coffee intake because they had been contemplating possible causes of their illness. Analyze how this affects the study results.
Q7
A medical student conducts a study by surveying 800 adults at a community health fair about their current alcohol consumption and simultaneously measuring their liver enzyme levels. She finds that 15% of participants have elevated liver enzymes and 40% report heavy alcohol use. Apply your understanding to identify the primary limitation of this study design.
Q8
A physician researcher enrolls 2,000 healthy nurses aged 25-55 years and records their dietary habits, exercise patterns, and medication use. She plans to follow them for 20 years to determine who develops cardiovascular disease. Apply this information to determine what measure of association can be directly calculated from this study.
Q9
A pharmaceutical company conducts a study where 1,000 patients with hypertension are randomly assigned to receive either a new antihypertensive medication or placebo. Participants are followed for 5 years to assess the incidence of myocardial infarction. Neither the patients nor the investigators know who receives the active drug. Apply your knowledge to identify the key advantage of this study design.
Q10
A researcher wants to investigate the relationship between smoking and lung cancer. She identifies 500 patients newly diagnosed with lung cancer and 500 age-matched controls without lung cancer from hospital records. She then reviews their medical histories to determine their smoking status over the past 20 years. What type of study design is being used?
Study designs US Medical PG Practice Questions and MCQs
Question 1: A research team has data from three completed studies on statin use and Alzheimer's disease: Study A (case-control, OR=0.6, n=500), Study B (retrospective cohort, RR=0.7, n=10,000), and Study C (RCT with cognitive decline as secondary endpoint, RR=0.9, n=2,000). The case-control study used prevalent cases, the cohort study had significant loss to follow-up in the unexposed group, and the RCT was underpowered for cognitive outcomes. Synthesize the evidence to determine the most reliable conclusion about the association.
A. The RCT provides the strongest evidence despite being underpowered
B. The retrospective cohort study offers the best balance of validity and precision (Correct Answer)
C. Case-control study is most reliable due to efficient rare outcome assessment
D. Evidence is contradictory and no conclusion can be drawn
E. Meta-analysis of all three studies provides the most accurate estimate
Explanation: ***The retrospective cohort study offers the best balance of validity and precision***
- Despite **differential loss to follow-up** in the unexposed, the very large **sample size (n=10,000)** provides high **statistical power** and precision compared to the other studies.
- Cohort studies establish a **temporal sequence** (statin use before disease onset), which is superior to cross-sectional or prevalent case-control designs for causal inference.
*The RCT provides the strongest evidence despite being underpowered*
- While **Randomized Controlled Trials (RCTs)** are higher in the evidence hierarchy, an **underpowered** study lacks the precision to detect a true effect, increasing the risk of a **Type II error**.
- Since cognitive decline was only a **secondary endpoint**, the study may not have been designed or followed long enough to assess Alzheimer's disease progression accurately.
*Case-control study is most reliable due to efficient rare outcome assessment*
- The use of **prevalent cases** instead of incident cases introduces **Neyman bias (survival bias)**, as it only includes patients who survived long enough to be studied.
- Case-control studies are prone to **recall bias** and cannot definitively prove that statin use preceded the onset of cognitive decline.
*Evidence is contradictory and no conclusion can be drawn*
- While the results vary, researchers can still synthesize evidence by weighing studies based on their **methodological quality**, **sample size**, and **bias profile**.
- A conclusion can be drawn by acknowledging the **conservative estimate** (bias toward the null) in the cohort study which still suggested a protective effect.
*Meta-analysis of all three studies provides the most accurate estimate*
- **Meta-analysis** of studies with fundamentally different designs (RCT, Cohort, Case-Control) and significant **methodological limitations** can lead to inaccurate pooled results ("garbage in, garbage out").
- Combining studies with different measures of association (OR vs RR) and distinct biases like **survival bias** and **differential attrition** increases **heterogeneity**.
Question 2: A public health department needs to determine whether a cluster of birth defects in a county is associated with industrial pollution. They have limited resources, the suspected exposure occurred 3-5 years ago, and the outcome is rare (15 cases identified). Multiple potential confounders exist including maternal age, socioeconomic status, and prenatal care access. The community demands rapid answers. Evaluate the most appropriate initial study design considering feasibility, ethics, and scientific validity.
A. Prospective cohort study of pregnant women with exposure monitoring
B. Randomized controlled trial comparing exposed and unexposed areas
C. Case-control study with multiple control groups and confounder adjustment (Correct Answer)
D. Ecologic study comparing county-level pollution and birth defect rates
E. Cross-sectional survey of current pollution levels and birth outcomes
Explanation: ***Case-control study with multiple control groups and confounder adjustment***
- A **case-control study** is the gold standard for investigating **rare outcomes** (only 15 cases identified) because it identifies subjects based on disease status rather than waiting for it to develop.
- This design is highly efficient for **retrospective exposures** that occurred years ago, allows for **rapid results** with limited resources, and can adjust for multiple **confounders** like maternal age and SES through statistical modeling.
*Prospective cohort study of pregnant women with exposure monitoring*
- This design is inappropriate for **rare outcomes** because it would require an massive sample size and many years of follow-up before seeing enough cases to be statistically significant.
- It cannot address a **past exposure** (3-5 years ago) as it follows subjects forward in time from the point of current exposure.
*Randomized controlled trial comparing exposed and unexposed areas*
- It is fundamentally **unethical** to intentionally expose human populations to potentially harmful industrial pollutants in an experimental setting.
- RCTs are used for **interventions** (like new drugs) rather than investigating the etiology of environmental health hazards.
*Ecologic study comparing county-level pollution and birth defect rates*
- This design is prone to the **ecologic fallacy**, where associations found at the population level may not hold true for individuals within that population.
- It cannot adjust for **individual-level confounders** such as personal prenatal care access or specific maternal age, making the results scientifically weaker for the community's needs.
*Cross-sectional survey of current pollution levels and birth outcomes*
- This design suffers from **temporal ambiguity**, as it measures exposure and outcome simultaneously, failing to confirm that pollution Exposure preceded the birth defects.
- Current pollution levels may not accurately reflect the **historical exposure** that occurred during the critical window of embryogenesis 3-5 years ago.
Question 3: A pharmaceutical company wants to evaluate a new anticoagulant's effectiveness in preventing stroke in atrial fibrillation patients. They have limited funding and need results within 2 years. The drug has promising phase 2 data. Concurrent medications and comorbidities vary widely in the target population. The company must choose between a pragmatic trial in 50 community hospitals or an explanatory trial at 3 academic centers with strict protocols. Evaluate which design best serves both scientific and practical objectives.
A. Cluster randomized trial across academic and community sites
B. Explanatory trial provides definitive efficacy data with maximum internal validity
C. Pragmatic trial in community settings with broad inclusion criteria (Correct Answer)
D. Sequential design starting with explanatory trial then pragmatic trial
E. Adaptive trial design with interim efficacy monitoring
Explanation: ***Pragmatic trial in community settings with broad inclusion criteria***
- A **pragmatic trial** evaluates **effectiveness** in real-world clinical practice, making it ideal for a target population with diverse comorbidities and concurrent medications.
- It offers higher **external validity** (generalizability) and is often more feasible and cost-effective within a short timeline compared to highly controlled designs.
*Cluster randomized trial across academic and community sites*
- While it group-randomizes sites, this design is more complex to coordinate and may not be necessary if the goal is individual-level **stroke prevention** results.
- It does not specifically address the need for **broad inclusion criteria** or the limited budget as effectively as a standard pragmatic design.
*Explanatory trial provides definitive efficacy data with maximum internal validity*
- **Explanatory trials** test **efficacy** under ideal, highly controlled conditions, which may not reflect the actual benefit in the general population with comorbidities.
- Strict protocols at only 3 centers lead to low **generalizability** and slower recruitment, potentially exceeding the 2-year deadline.
*Sequential design starting with explanatory trial then pragmatic trial*
- A **sequential design** would require significantly more time and **funding** than the current two-year budget constraints allow.
- Since the drug already has promising **Phase 2 data**, moving directly to an effectiveness-focused pragmatic trial is a more strategic use of resources.
*Adaptive trial design with interim efficacy monitoring*
- **Adaptive designs** allow for modifications based on interim data but are statistically complex and often more **expensive** to manage.
- While scientifically rigorous, it does not prioritize the need for **real-world effectiveness** data in community settings requested by the scenario.
Question 4: A randomized controlled trial of a new diabetes medication shows significant reduction in HbA1c levels (p<0.001). However, 40% of participants in the treatment group and 15% in the placebo group dropped out before study completion, primarily due to gastrointestinal side effects in the treatment group. The analysis includes only participants who completed the study. Analyze the impact on study conclusions.
A. Hawthorne effect has contaminated both study arms equally
B. Regression to the mean explains the observed treatment effect
C. Intention-to-treat analysis has been violated, likely overestimating treatment benefit (Correct Answer)
D. Per-protocol analysis provides the most accurate efficacy estimate
E. Selection bias from randomization failure invalidates the results
Explanation: ***Intention-to-treat analysis has been violated, likely overestimating treatment benefit***
- By including only participants who completed the study, the researchers performed a **per-protocol analysis**, which ignores **attrition bias** caused by the high dropout rate in the treatment arm.
- **Intention-to-treat (ITT)** analysis is designed to prevent such bias by analyzing participants based on their original group assignment, regardless of completion or compliance.
*Hawthorne effect has contaminated both study arms equally*
- The **Hawthorne effect** refers to subjects changing their behavior because they know they are being observed, not loss of participants due to side effects.
- This effect would typically affect the **observation process** rather than causing differential dropout rates between treatment and placebo groups.
*Regression to the mean explains the observed treatment effect*
- **Regression to the mean** is a statistical phenomenon where extreme initial measurements naturally move toward the average upon repeated testing.
- While it can influence longitudinal studies, it does not account for the **differential attrition** and side-effect profile described in this trial.
*Per-protocol analysis provides the most accurate efficacy estimate*
- **Per-protocol analysis** often leads to an optimistic or **overestimated efficacy** because it only evaluates the "healthiest" or most compliant patients who tolerated the drug.
- It fails to reflect **real-world effectiveness**, as it disregards the significance of treatment-limiting side effects (like the GI issues seen here).
*Selection bias from randomization failure invalidates the results*
- **Selection bias** usually occurs at the recruitment or **assignment phase**, whereas this scenario describes **attrition bias** occurring after the study has begun.
- Initial **randomization** may have been successful, but the failure to account for dropouts at the analysis stage is what introduces the bias.
Question 5: A study follows 5,000 healthcare workers from 1995 to 2015, tracking their exposure to bloodborne pathogens and development of chronic infections. The researchers use employment records to determine exposure history rather than relying on participant recall. In 2010, the hospital implemented a new electronic medical record system that improved documentation of needle-stick injuries. Analyze how this change affects the validity of the exposure assessment.
A. Introduces surveillance bias affecting the exposure-outcome relationship (Correct Answer)
B. Creates selection bias by preferentially identifying exposed individuals
C. Represents non-differential misclassification reducing study power
D. Introduces lead-time bias affecting survival analysis
E. Creates volunteer bias affecting external validity
Explanation: ***Introduces surveillance bias affecting the exposure-outcome relationship***
- The shift to an **electronic medical record system** in 2010 leads to more intensive or accurate monitoring of exposure (needle-sticks) specifically in the later period, creating **surveillance bias**.
- This reflects **differential measurement** of exposure over time, which can artifactually distort the association between bloodborne pathogens and infection rates during different intervals of the study.
*Creates selection bias by preferentially identifying exposed individuals*
- **Selection bias** occurs during the enrollment phase of a study when participants are selected based on factors related to both exposure and outcome.
- This scenario describes a change in **information gathering** and documentation after the cohort has already been established and followed, not a flaw in the initial selection.
*Represents non-differential misclassification reducing study power*
- **Non-differential misclassification** occurs when the degree of error in recording exposure is the same across all groups, usually biasing results toward the **null hypothesis**.
- In this study, the systematic improvement in documentation mid-study creates a **differential** change in data quality over time rather than a random, uniform error.
*Introduces lead-time bias affecting survival analysis*
- **Lead-time bias** is a phenomenon where early detection (e.g., through screening) makes it appear as though survival time has increased, even if the actual disease course is unchanged.
- The scenario focuses on the **documentation of exposure** (needle-sticks), not the early detection of the chronic infection outcome or its impact on survival duration.
*Creates volunteer bias affecting external validity*
- **Volunteer bias** occurs when individuals who choose to participate in a study have different characteristics than those who do not, affecting **generalizability**.
- Since the researchers are using **employment records** of a defined cohort rather than self-selection by the workers, volunteer bias is not the primary issue here.
Question 6: A researcher studies 300 patients with pancreatic cancer and 300 matched controls, collecting information about coffee consumption over their lifetime. The odds ratio for pancreatic cancer in heavy coffee drinkers is 3.2 (95% CI: 2.1-4.8). However, the researcher later discovers that cases were more likely to accurately recall their coffee intake because they had been contemplating possible causes of their illness. Analyze how this affects the study results.
A. Selection bias has inflated the odds ratio
B. Confounding bias has created a spurious association
C. Recall bias has likely overestimated the true association (Correct Answer)
D. Information bias from the interviewer has affected results
E. Loss to follow-up has introduced attrition bias
Explanation: ***Recall bias has likely overestimated the true association***
- **Recall bias** is a type of **information bias** common in **case-control studies** where cases are more likely to ruminatively search for potential causes, leading to differential reporting.
- This leads to **differential misclassification**, which in this scenario results in cases over-reporting exposure, thereby inflating the **Odds Ratio (OR)** and overestimating the link between coffee and cancer.
*Selection bias has inflated the odds ratio*
- **Selection bias** occurs during the recruitment phase when the study population is not representative of the target population; it is not related to how participants report data.
- Examples include **Berkson bias** or the **healthy worker effect**, which do not apply to the differential reporting of coffee intake mentioned.
*Confounding bias has created a spurious association*
- **Confounding** occurs when an external factor (e.g., smoking) is associated with both the exposure and the outcome, distorting the effect of the primary exposure.
- While coffee drinkers often smoke, the scenario specifically identifies **differential recall** by the cases as the source of error, not an unmeasured third variable.
*Information bias from the interviewer has affected results*
- **Interviewer bias** occurs when the researcher’s knowledge of the participant's disease status influences how they ask questions or record data.
- In this case, the error stems from the **participant's internal contemplation** and memory retrieval rather than the researcher's approach.
*Loss to follow-up has introduced attrition bias*
- **Attrition bias** is typically seen in **prospective cohort studies** or **Randomized Controlled Trials (RCTs)** when participants drop out over time.
- Since this is a **case-control study** looking back at historical data, loss to follow-up during a monitoring period is not a relevant factor.
Question 7: A medical student conducts a study by surveying 800 adults at a community health fair about their current alcohol consumption and simultaneously measuring their liver enzyme levels. She finds that 15% of participants have elevated liver enzymes and 40% report heavy alcohol use. Apply your understanding to identify the primary limitation of this study design.
A. Cannot establish temporality between exposure and outcome (Correct Answer)
B. High cost and long duration of data collection
C. Requires large sample size for adequate power
D. Subject to recall bias from retrospective data collection
E. Cannot calculate prevalence of disease
Explanation: ***Cannot establish temporality between exposure and outcome***
- This study is a **cross-sectional study** because it measures both exposure (alcohol use) and outcome (liver enzymes) at a **single point in time**.
- Because data is collected simultaneously, it is impossible to determine if the exposure preceded the outcome, making it impossible to establish a **causal relationship**.
*High cost and long duration of data collection*
- Cross-sectional studies are generally **low cost** and relatively **quick** to perform compared to longitudinal designs.
- Unlike **prospective cohort studies**, there is no long-term follow-up required, which saves considerable resources.
*Requires large sample size for adequate power*
- While larger samples improve precision, cross-sectional studies are often used for **prevalence** and can be effectively conducted with smaller groups compared to studies of rare diseases.
- The primary limitation here is **structural (design-based)** rather than related to the statistical power of the sample size.
*Subject to recall bias from retrospective data collection*
- **Recall bias** is a classic limitation of **case-control studies**, where patients with a disease are more likely to remember exposures differently than healthy controls.
- While survey data can have inaccuracies, the lack of **temporality** is a more fundamental limitation of the cross-sectional design utilized here.
*Cannot calculate prevalence of disease*
- Cross-sectional studies are actually the **gold standard** design for calculating **prevalence** within a population.
- This study specifically allows the researcher to determine that **15%** of the sampled population currently has elevated liver enzymes.
Question 8: A physician researcher enrolls 2,000 healthy nurses aged 25-55 years and records their dietary habits, exercise patterns, and medication use. She plans to follow them for 20 years to determine who develops cardiovascular disease. Apply this information to determine what measure of association can be directly calculated from this study.
A. Odds ratio only
B. Prevalence and incidence
C. Number needed to treat
D. Sensitivity and specificity
E. Relative risk and attributable risk (Correct Answer)
Explanation: ***Relative risk and attributable risk***
- This study design is a **prospective cohort study**, as it tracks healthy individuals over time to monitor the development of a specific outcome.
- Because the **incidence** in both exposed and unexposed groups can be measured, researchers can directly calculate the **relative risk** and **attributable risk**.
*Odds ratio only*
- The **odds ratio** is the primary measure of association used in **case-control studies**, which compare cases with a disease to controls without it.
- While an odds ratio can be calculated from a cohort study, it is not the primary or only measure available in this design.
*Prevalence and incidence*
- While **incidence** can be determined in a prospective study, **prevalence** is typically measured in **cross-sectional studies** as it represents a snapshot in time.
- This option reflects measures of disease frequency rather than the specific **measures of association** between exposure and disease requested.
*Number needed to treat*
- **Number needed to treat (NNT)** is a calculation derived from **randomized controlled trials (RCTs)** to assess the effectiveness of an intervention.
- As this is an **observational cohort study** rather than an experimental trial, NNT is not a standard measure calculated here.
*Sensitivity and specificity*
- These terms refer to the quality and accuracy of a **diagnostic test**, rather than the association between risk factors and disease development.
- They are determined by comparing a new test against a **gold standard** in a specific study population.
Question 9: A pharmaceutical company conducts a study where 1,000 patients with hypertension are randomly assigned to receive either a new antihypertensive medication or placebo. Participants are followed for 5 years to assess the incidence of myocardial infarction. Neither the patients nor the investigators know who receives the active drug. Apply your knowledge to identify the key advantage of this study design.
A. Eliminates selection bias and confounding variables (Correct Answer)
B. Most cost-effective method for rare outcomes
C. Provides the highest level of evidence with minimal loss to follow-up
D. Best suited for studying multiple outcomes simultaneously
E. Allows calculation of prevalence rates
Explanation: ***Eliminates selection bias and confounding variables***
- **Randomization** ensures that both known and unknown **confounding variables** are distributed equally between the control and treatment groups.
- This process removes **selection bias**, allowing the study to establish a clear **cause-and-effect** relationship between the intervention and the outcome.
*Most cost-effective method for rare outcomes*
- **Randomized Controlled Trials (RCTs)** are typically expensive and time-consuming; **Case-control studies** are the preferred method for studying **rare outcomes**.
- Rare outcomes in an RCT would require a massive sample size and many years of follow-up, significantly increasing **costs**.
*Provides the highest level of evidence with minimal loss to follow-up*
- While RCTs provide the **highest level of evidence**, they are highly susceptible to **loss to follow-up** (attrition bias) due to their longitudinal nature.
- A randomized design does not inherently protect against participants dropping out over a **5-year period**.
*Best suited for studying multiple outcomes simultaneously*
- **Cohort studies** are generally better suited for observing the natural history of a condition and multiple different **outcomes** from a single exposure.
- RCTs are usually designed with a specific **primary end-point** (e.g., incidence of myocardial infarction) to maintain statistical power.
*Allows calculation of prevalence rates*
- **Cross-sectional studies** are the specific design used to measure **prevalence** at a single point in time.
- Longitudinal studies like RCTs and Cohort studies allow for the calculation of **incidence**, not prevalence.
Question 10: A researcher wants to investigate the relationship between smoking and lung cancer. She identifies 500 patients newly diagnosed with lung cancer and 500 age-matched controls without lung cancer from hospital records. She then reviews their medical histories to determine their smoking status over the past 20 years. What type of study design is being used?
A. Case-control study (Correct Answer)
B. Cross-sectional study
C. Retrospective cohort study
D. Prospective cohort study
E. Randomized controlled trial
Explanation: ***Case-control study***
- This study design begins by identifying participants based on their **outcome status** (lung cancer cases vs. healthy controls) and then looks back in time to assess **prior exposure** (smoking).
- It is highly efficient for studying diseases with **long latency periods**, such as cancer, and uses the **odds ratio** as the primary measure of association.
*Cross-sectional study*
- This design measures both **exposure and outcome simultaneously** at a single point in time, providing a "snapshot" of a population.
- It cannot establish a **temporal relationship** or determine if the exposure preceded the development of the disease.
*Retrospective cohort study*
- This design identifies participants based on their **exposure status** in the past and follows them forward through existing records to see if they developed the disease.
- Unlike case-control studies, the starting point of the selection is **exposure**, not the presence of the disease itself.
*Prospective cohort study*
- This involves selecting a group of **disease-free individuals**, classifying them by exposure, and following them into the **future** to see who develops the outcome.
- This design is expensive and time-consuming but is superior for calculating **relative risk** and incidence.
*Randomized controlled trial*
- This is an **experimental study** where participants are randomly assigned to either an intervention or a control group to evaluate efficacy.
- It is generally considered unethical or impossible to **randomly assign** harmful exposures like smoking to human subjects.