A neuro-oncology investigator has recently conducted a randomized controlled trial in which the addition of a novel alkylating agent to radiotherapy was found to prolong survival in comparison to radiotherapy alone (HR = 0.7, p < 0.01). A number of surviving participants who took the alkylating agent reported that they had experienced significant nausea from the medication. The investigator surveyed all participants in both the treatment and the control group on their nausea symptoms by self-report rated mild, moderate, or severe. The investigator subsequently compared the two treatment groups with regards to nausea level.
| | Mild nausea | Moderate nausea | Severe nausea |
|---|---|---|---|
| Treatment group (%) | 20 | 30 | 50 |
| Control group (%) | 35 | 35 | 30 |
Which of the following statistical methods would be most appropriate to assess the statistical significance of these results?
Q22
A group of investigators seeks to compare the non-inferiority of a new angiotensin receptor blocker, salisartan, with losartan for reduction of blood pressure. 2,000 patients newly diagnosed with hypertension are recruited for the trial; the first 1,000 recruited patients are administered losartan, and the other half are administered salisartan. Patients with a baseline systolic blood pressure less than 100 mmHg are excluded from the study. Blood pressure is measured every week for four weeks, with the primary outcome being a reduction in systolic blood pressure by salisartan within 10% of that of the control. Secondary outcomes include incidence of subjective improvement in symptoms, improvement of ejection fraction, and incidence of cough. 500 patients withdraw from the study due to symptomatic side effects. In an intention-to-treat analysis, salisartan is deemed to be non-inferior to losartan for the primary outcome but inferior for all secondary outcomes. As the investigators launch a national advertising campaign for salisartan, independent groups report that the drug is inferior for its primary outcome compared to losartan and associated with respiratory failure among patients with pulmonary hypertension. How could this study have been improved?
Q23
A survey was conducted in a US midwestern town in an effort to assess maternal mortality over the past year. The data from the survey are given in the table below:
Women of childbearing age 250,000
Maternal deaths 2,500
Number of live births 100, 000
Number of deaths of women of childbearing age 7,500
Maternal death is defined as the death of a woman while pregnant or within 42 days of termination of pregnancy from any cause related to or aggravated by, the pregnancy. Which of the following is the maternal mortality rate in this midwestern town?
Q24
The height of American adults is expected to follow a normal distribution, with a typical male adult having an average height of 69 inches with a standard deviation of 0.1 inches. An investigator has been informed about a community in the American Midwest with a history of heavy air and water pollution in which a lower mean height has been reported. The investigator plans to sample 30 male residents to test the claim that heights in this town differ significantly from the national average based on heights assumed be normally distributed. The significance level is set at 10% and the probability of a type 2 error is assumed to be 15%. Based on this information, which of the following is the power of the proposed study?
Q25
Which of the following study designs would be most appropriate to investigate the association between electronic cigarette use and the subsequent development of lung cancer?
Q26
An academic medical center in the United States is approached by a pharmaceutical company to run a small clinical trial to test the effectiveness of its new drug, compound X. The company wants to know if the measured hemoglobin a1c (Hba1c) of patients with type 2 diabetes receiving metformin and compound X would be lower than that of control subjects receiving only metformin. After a year of study and data analysis, researchers conclude that the control and treatment groups did not differ significantly in their Hba1c levels.
However, parallel clinical trials in several other countries found that compound X led to a significant decrease in Hba1c. Interested in the discrepancy between these findings, the company funded a larger study in the United States, which confirmed that compound X decreased Hba1c levels. After compound X was approved by the FDA, and after several years of use in the general population, outcomes data confirmed that it effectively lowered Hba1c levels and increased overall survival. What term best describes the discrepant findings in the initial clinical trial run by institution A?
Q27
Group of 100 medical students took an end of the year exam. The mean score on the exam was 70%, with a standard deviation of 25%. The professor states that a student's score must be within the 95% confidence interval of the mean to pass the exam. Which of the following is the minimum score a student can have to pass the exam?
Q28
A researcher is investigating the relationship between interleukin-1 (IL-1) levels and mortality in patients with end-stage renal disease (ESRD) on hemodialysis. In 2017, 10 patients (patients 1–10) with ESRD on hemodialysis were recruited for a pilot study in which IL-1 levels were measured (mean = 88.1 pg/mL). In 2018, 5 additional patients (patients 11–15) were recruited. Results are shown:
Patient IL-1 level (pg/mL) Patient IL-1 level (pg/mL)
Patient 1 (2017) 84 Patient 11 (2018) 91
Patient 2 (2017) 87 Patient 12 (2018) 32
Patient 3 (2017) 95 Patient 13 (2018) 86
Patient 4 (2017) 93 Patient 14 (2018) 90
Patient 5 (2017) 99 Patient 15 (2018) 81
Patient 6 (2017) 77
Patient 7 (2017) 82
Patient 8 (2017) 90
Patient 9 (2017) 85
Patient 10 (2017) 89
Which of the following statements about the results of the study is most accurate?
Q29
A group of researchers is trying to create a new drug that more effectively decreases systolic blood pressure levels, and it has entered the clinical trial period of their drug's development. If, during their trial, the scientists wanted to examine a mutual or linear relationship between 2 continuous variables, which of the following statistical models would be most appropriate for them to use?
Q30
A gastroenterology fellow is interested in the relationship between smoking and incidence of Barrett esophagus. At a departmental grand rounds she recently attended, one of the presenters claimed that smokers are only at increased risk for Barrett esophagus in the presence of acid reflux. She decides to design a retrospective cohort study to investigate the association between smoking and Barrett esophagus. After comparing 400 smokers to 400 non-smokers identified via chart review, she finds that smokers were at increased risk of Barrett esophagus at the end of a 10-year follow-up period (RR = 1.82, p < 0.001). Among patients with a history of acid reflux, there was no relationship between smoking and Barrett esophagus (p = 0.52). Likewise, no relationship was found between smoking and Barrett esophagus among patients without a history of acid reflux (p = 0.48). The results of this study are best explained by which of the following?
Study Design US Medical PG Practice Questions and MCQs
Question 21: A neuro-oncology investigator has recently conducted a randomized controlled trial in which the addition of a novel alkylating agent to radiotherapy was found to prolong survival in comparison to radiotherapy alone (HR = 0.7, p < 0.01). A number of surviving participants who took the alkylating agent reported that they had experienced significant nausea from the medication. The investigator surveyed all participants in both the treatment and the control group on their nausea symptoms by self-report rated mild, moderate, or severe. The investigator subsequently compared the two treatment groups with regards to nausea level.
| | Mild nausea | Moderate nausea | Severe nausea |
|---|---|---|---|
| Treatment group (%) | 20 | 30 | 50 |
| Control group (%) | 35 | 35 | 30 |
Which of the following statistical methods would be most appropriate to assess the statistical significance of these results?
A. Chi-square test (Correct Answer)
B. Pearson correlation coefficient
C. Multiple logistic regression
D. Unpaired t-test
E. Paired t-test
Explanation: **Chi-square test**
- The **Chi-square test** is appropriate for comparing **categorical data** (mild, moderate, severe) between two or more independent groups (treatment vs. control).
- It assesses whether there is a statistically significant association between the two categorical variables (treatment group and nausea severity).
*Pearson correlation coefficient*
- The **Pearson correlation coefficient** is used to measure the **linear relationship** between two **continuous variables**.
- Nausea severity (mild, moderate, severe) is an **ordinal categorical variable**, not a continuous one.
*Multiple logistic regression*
- **Multiple logistic regression** is used to predict a **binary outcome** (e.g., presence or absence of nausea) based on one or more independent variables, which can be continuous or categorical.
- The outcome here is **ordinal categorical** (mild, moderate, severe nausea), not binary. While logistic regression can be adapted for ordinal outcomes, a simpler Chi-square test is more direct for comparing distributions without prediction.
*Unpaired t-test*
- An **unpaired t-test** is used to compare the **means of two independent continuous variables**.
- Nausea levels are categorical, and we are interested in comparing proportions within categories, not means.
*Paired t-test*
- A **paired t-test** is used to compare the **means of two related (paired) continuous variables**.
- The study involves independent treatment and control groups, and the nausea data is categorical, making the paired t-test unsuitable.
Question 22: A group of investigators seeks to compare the non-inferiority of a new angiotensin receptor blocker, salisartan, with losartan for reduction of blood pressure. 2,000 patients newly diagnosed with hypertension are recruited for the trial; the first 1,000 recruited patients are administered losartan, and the other half are administered salisartan. Patients with a baseline systolic blood pressure less than 100 mmHg are excluded from the study. Blood pressure is measured every week for four weeks, with the primary outcome being a reduction in systolic blood pressure by salisartan within 10% of that of the control. Secondary outcomes include incidence of subjective improvement in symptoms, improvement of ejection fraction, and incidence of cough. 500 patients withdraw from the study due to symptomatic side effects. In an intention-to-treat analysis, salisartan is deemed to be non-inferior to losartan for the primary outcome but inferior for all secondary outcomes. As the investigators launch a national advertising campaign for salisartan, independent groups report that the drug is inferior for its primary outcome compared to losartan and associated with respiratory failure among patients with pulmonary hypertension. How could this study have been improved?
A. Increased study duration
B. Posthoc analysis of primary outcome among patients who withdrew from study
C. Randomization (Correct Answer)
D. Increased sample size
E. Retrial of primary outcome for clinical effectiveness instead of non-inferiority
Explanation: ***Randomization***
- The study allocated patients **sequentially** (first 1,000 to losartan, next 1,000 to salisartan), introducing **selection bias** as the two groups may not be comparable at baseline for unmeasured confounders.
- **Randomization** ensures that both known and unknown confounding factors are evenly distributed between treatment groups, making the groups comparable and increasing the reliability of the observed treatment effects.
- The lack of randomization explains why independent groups found **different results**—the study's internal validity was compromised by systematic differences between groups that were not due to the intervention itself.
- Sequential allocation is particularly problematic because patient characteristics may **change over time** (e.g., seasonal variations, changes in referral patterns, or evolution in diagnostic criteria).
*Increased study duration*
- While a longer study duration might reveal long-term effects or adverse events, the primary issue of **baseline incomparability** due to the lack of randomization would persist.
- Increasing duration would not address the fundamental flaw in the **patient allocation method** that led to potential bias.
*Posthoc analysis of primary outcome among patients who withdrew from study*
- A **post-hoc analysis** of withdrawn patients would be useful for understanding reasons for withdrawal but cannot correct for the initial lack of randomization or the **attrition bias** caused by the large number of withdrawals (500/2,000 = 25%).
- This approach would also be susceptible to **selection bias** because the reasons for withdrawal might differ between the two groups.
- While **intention-to-treat analysis** was performed, the fundamental allocation bias remains.
*Increased sample size*
- A larger sample size generally increases statistical power and precision, but it does not correct for **systematic errors** introduced by a flawed study design, such as lack of randomization.
- Increasing the sample size would simply replicate the biased allocation across more participants, potentially **amplifying** the effects of selection bias rather than reducing them.
*Retrial of primary outcome for clinical effectiveness instead of non-inferiority*
- Changing the trial design from **non-inferiority** to **superiority** would alter the hypothesis being tested but would not address the underlying methodological flaws.
- The mode of patient allocation (sequential assignment) remains the critical weakness, invalidating any conclusions regarding either non-inferiority or superiority.
- The discrepancy between this study's findings and independent reports highlights that the **study design** (not the research question) was flawed.
Question 23: A survey was conducted in a US midwestern town in an effort to assess maternal mortality over the past year. The data from the survey are given in the table below:
Women of childbearing age 250,000
Maternal deaths 2,500
Number of live births 100, 000
Number of deaths of women of childbearing age 7,500
Maternal death is defined as the death of a woman while pregnant or within 42 days of termination of pregnancy from any cause related to or aggravated by, the pregnancy. Which of the following is the maternal mortality rate in this midwestern town?
A. 1,000 per 100,000 live births
B. 33 per 100,000 live births
C. 3,000 per 100,000 live births
D. 33,300 per 100,000 live births
E. 2,500 per 100,000 live births (Correct Answer)
Explanation: ***2,500 per 100,000 live births***
- The maternal mortality rate is calculated as the number of **maternal deaths** per 100,000 **live births**. The given data directly provide these values.
- Calculation: (2,500 maternal deaths / 100,000 live births) × 100,000 = **2,500 per 100,000 live births**.
*1,000 per 100,000 live births*
- This value is incorrect as it does not align with the provided numbers for maternal deaths and live births in the calculation.
- It might result from a miscalculation or using incorrect numerator/denominator values from the dataset.
*33 per 100,000 live births*
- This value is significantly lower than the correct rate and suggests a substantial error in calculation or an incorrect understanding of how the maternal mortality rate is derived.
- It could potentially result from dividing the number of live births by maternal deaths, which is the inverse of the correct formula.
*3,000 per 100,000 live births*
- This option is close to the correct answer but slightly higher, indicating a possible calculation error, for instance, including non-maternal deaths or other causes of deaths in the numerator.
- The definition of maternal death is specific to pregnancy-related or aggravated causes, so extraneous deaths would inflate the rate.
*33,300 per 100,000 live births*
- This figure results from incorrectly calculating the proportion of maternal deaths among all deaths of women of childbearing age: (2,500 / 7,500) × 100,000 = 33,333.
- This is a conceptual error as the maternal mortality rate should use live births as the denominator, not total deaths of women of childbearing age.
Question 24: The height of American adults is expected to follow a normal distribution, with a typical male adult having an average height of 69 inches with a standard deviation of 0.1 inches. An investigator has been informed about a community in the American Midwest with a history of heavy air and water pollution in which a lower mean height has been reported. The investigator plans to sample 30 male residents to test the claim that heights in this town differ significantly from the national average based on heights assumed be normally distributed. The significance level is set at 10% and the probability of a type 2 error is assumed to be 15%. Based on this information, which of the following is the power of the proposed study?
A. 0.10
B. 0.85 (Correct Answer)
C. 0.90
D. 0.15
E. 0.05
Explanation: ***0.85***
- **Power** is defined as **1 - β**, where β is the **probability of a Type II error**.
- Given that the probability of a **Type II error (β)** is 15% or 0.15, the power of the study is 1 - 0.15 = **0.85**.
*0.10*
- This value represents the **significance level (α)**, which is the probability of committing a **Type I error** (rejecting a true null hypothesis).
- The significance level is distinct from the **power of the study**, which relates to Type II errors.
*0.90*
- This value would be the power if the **Type II error rate (β)** was 0.10 (1 - 0.10 = 0.90), but the question specifies a β of 0.15.
- It is also the complement of the significance level (1 - α), which is not the definition of power.
*0.15*
- This value is the **probability of a Type II error (β)**, not the power of the study.
- **Power** is the probability of correctly rejecting a false null hypothesis, which is 1 - β.
*0.05*
- While 0.05 is a common significance level (α), it is not given as the significance level in this question (which is 0.10).
- This value also does not represent the power of the study, which would be calculated using the **Type II error rate**.
Question 25: Which of the following study designs would be most appropriate to investigate the association between electronic cigarette use and the subsequent development of lung cancer?
A. Subjects with lung cancer who smoke and subjects with lung cancer who did not smoke
B. Subjects who smoke electronic cigarettes and subjects who smoke normal cigarettes
C. Subjects with lung cancer who smoke and subjects without lung cancer who smoke
D. Subjects with lung cancer and subjects without lung cancer
E. Subjects who smoke electronic cigarettes and subjects who do not smoke (Correct Answer)
Explanation: ***Subjects who smoke electronic cigarettes and subjects who do not smoke***
- This design represents a **cohort study**, which is ideal for investigating the **incidence** of a disease (lung cancer) in groups exposed and unexposed to a risk factor (electronic cigarette use).
- By following these two groups over time, researchers can directly compare the **risk of developing lung cancer** in e-cigarette users versus non-smokers.
*Subjects with lung cancer who smoke and subjects with lung cancer who did not smoke*
- This option incorrectly compares two groups both with lung cancer, where the exposure to smoking can either be **electronic or traditional cigarettes,** but does not provide a control group without lung cancer to assess the association.
- This design would not allow for the calculation of an **incidence rate** or a **relative risk** of lung cancer development specific to electronic cigarette use.
*Subjects who smoke electronic cigarettes and subjects who smoke normal cigarettes*
- This design compares two different types of smoking, which might be useful for comparing their relative risks but doesn't include a **non-smoking control group** to establish the absolute association with electronic cigarettes.
- While it could show if e-cigarettes are "safer" than traditional cigarettes, it wouldn't directly answer whether e-cigarettes themselves **cause lung cancer**.
*Subjects with lung cancer who smoke and subjects without lung cancer who smoke*
- This describes a **case-control study** but focuses on smoking in general rather than specifically electronic cigarettes, which is the independent variable of interest.
- While valuable for identifying risk factors, it would need to specifically differentiate between **electronic cigarette smokers** and other smokers to answer the question adequately.
*Subjects with lung cancer and subjects without lung cancer*
- This general description of a **case-control study** is too broad; it does not specify the exposure of interest, which is electronic cigarette use.
- To be relevant, the study would need to gather data on **electronic cigarette use** in both the lung cancer group and the non-lung cancer control group.
Question 26: An academic medical center in the United States is approached by a pharmaceutical company to run a small clinical trial to test the effectiveness of its new drug, compound X. The company wants to know if the measured hemoglobin a1c (Hba1c) of patients with type 2 diabetes receiving metformin and compound X would be lower than that of control subjects receiving only metformin. After a year of study and data analysis, researchers conclude that the control and treatment groups did not differ significantly in their Hba1c levels.
However, parallel clinical trials in several other countries found that compound X led to a significant decrease in Hba1c. Interested in the discrepancy between these findings, the company funded a larger study in the United States, which confirmed that compound X decreased Hba1c levels. After compound X was approved by the FDA, and after several years of use in the general population, outcomes data confirmed that it effectively lowered Hba1c levels and increased overall survival. What term best describes the discrepant findings in the initial clinical trial run by institution A?
A. Type I error
B. Hawthorne effect
C. Type II error (Correct Answer)
D. Publication bias
E. Confirmation bias
Explanation: ***Type II error***
- A **Type II error** occurs when a study fails to **reject a false null hypothesis**, meaning it concludes there is no significant difference or effect when one actually exists.
- In this case, the initial US trial incorrectly concluded that Compound X had no significant effect on HbA1c, while subsequent larger studies and real-world data proved it did.
*Type I error*
- A **Type I error** (alpha error) occurs when a study incorrectly **rejects a true null hypothesis**, concluding there is a significant difference or effect when there isn't.
- This scenario describes the opposite: the initial study failed to find an effect that genuinely existed, indicating a Type II error, not a Type I error.
*Hawthorne effect*
- The **Hawthorne effect** is a type of reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed.
- This effect does not explain the initial trial's failure to detect a real drug effect; rather, it relates to participants changing behavior due to study participation itself.
*Publication bias*
- **Publication bias** occurs when studies with positive or statistically significant results are more likely to be published than those with negative or non-significant results.
- While relevant to the literature as a whole, it doesn't explain the discrepancy in findings within a single drug's development where a real effect was initially missed.
*Confirmation bias*
- **Confirmation bias** is the tendency to search for, interpret, favor, and recall information in a way that confirms one's preexisting beliefs or hypotheses.
- This bias would likely lead researchers to *find* an effect if they expected one, or to disregard data that contradicts their beliefs, which is not what happened in the initial trial.
Question 27: Group of 100 medical students took an end of the year exam. The mean score on the exam was 70%, with a standard deviation of 25%. The professor states that a student's score must be within the 95% confidence interval of the mean to pass the exam. Which of the following is the minimum score a student can have to pass the exam?
A. 45%
B. 63.75%
C. 67.5%
D. 20%
E. 65% (Correct Answer)
Explanation: ***65%***
- To find the **95% confidence interval (CI) of the mean**, we use the formula: Mean ± (Z-score × Standard Error). For a 95% CI, the Z-score is approximately **1.96**.
- The **Standard Error (SE)** is calculated as SD/√n, where n is the sample size (100 students). So, SE = 25%/√100 = 25%/10 = **2.5%**.
- The 95% CI is 70% ± (1.96 × 2.5%) = 70% ± 4.9%. The lower bound is 70% - 4.9% = **65.1%**, which rounds to **65%** as the minimum passing score.
*45%*
- This value is significantly lower than the calculated lower bound of the 95% confidence interval (approximately 65.1%).
- It would represent a score far outside the defined passing range.
*63.75%*
- This value falls below the calculated lower bound of the 95% confidence interval (approximately 65.1%).
- While close, this score would not meet the professor's criterion for passing.
*67.5%*
- This value is within the 95% confidence interval (65.1% to 74.9%) but is **not the minimum score**.
- Lower scores within the interval would still qualify as passing.
*20%*
- This score is extremely low and falls significantly outside the 95% confidence interval for a mean of 70%.
- It would indicate performance far below the defined passing threshold.
Question 28: A researcher is investigating the relationship between interleukin-1 (IL-1) levels and mortality in patients with end-stage renal disease (ESRD) on hemodialysis. In 2017, 10 patients (patients 1–10) with ESRD on hemodialysis were recruited for a pilot study in which IL-1 levels were measured (mean = 88.1 pg/mL). In 2018, 5 additional patients (patients 11–15) were recruited. Results are shown:
Patient IL-1 level (pg/mL) Patient IL-1 level (pg/mL)
Patient 1 (2017) 84 Patient 11 (2018) 91
Patient 2 (2017) 87 Patient 12 (2018) 32
Patient 3 (2017) 95 Patient 13 (2018) 86
Patient 4 (2017) 93 Patient 14 (2018) 90
Patient 5 (2017) 99 Patient 15 (2018) 81
Patient 6 (2017) 77
Patient 7 (2017) 82
Patient 8 (2017) 90
Patient 9 (2017) 85
Patient 10 (2017) 89
Which of the following statements about the results of the study is most accurate?
A. The mean of IL-1 measurements is now larger than the mode.
B. The standard deviation was decreased by the five new patients who joined the study in 2018.
C. The median of IL-1 measurements is now larger than the mean. (Correct Answer)
D. Systematic error was introduced by the five new patients who joined the study in 2018.
E. The range of the data set is unaffected by the addition of five new patients in 2018.
Explanation: ***The median of IL-1 measurements is now larger than the mean.***
- The new mean is 85.47 (sum of all IL-1 levels divided by 15). The sorted data set is 32, 77, 81, 82, 84, 85, 86, **87**, 89, 90, 90, 91, 93, 95, 99; the median is the 8th value, which is 87. Thus, the new median (87) is larger than the new mean (85.47).
- This conclusion requires calculation of both the **mean** and **median** for the combined dataset of 15 patients.
*The mean of IL-1 measurements is now larger than the mode.*
- The new mean is 85.47. The mode is 90 (it appears twice, while all other values appear once). Therefore, the mean (85.47) is *not* larger than the mode (90).
- Calculation of the **mean** and identification of the **mode** for the combined dataset negates this statement.
*The range of the data set is unaffected by the addition of five new patients in 2018.*
- In 2017, the range was 99 (max) - 77 (min) = 22. With the addition of patient 12 (IL-1 level of 32), the new minimum changed from 77 to 32.
- The new range is 99 (max) - 32 (min) = 67, which is a significant increase from the original range of 22.
*The standard deviation was decreased by the five new patients who joined the study in 2018.*
- The addition of patient 12 with an IL-1 level of 32, which is an **outlier**, significantly increased the **spread of the data**.
- A larger spread of data, especially due to an outlier, typically **increases the standard deviation**, not decreases it.
*Systematic error was introduced by the five new patients who joined the study in 2018.*
- **Systematic error** refers to a consistent, repeatable error in measurement or experimental design that biases results in a particular direction.
- The information provided describes individual patient data and does not indicate any **consistent bias** in data collection or measurement methods for the new patients.
Question 29: A group of researchers is trying to create a new drug that more effectively decreases systolic blood pressure levels, and it has entered the clinical trial period of their drug's development. If, during their trial, the scientists wanted to examine a mutual or linear relationship between 2 continuous variables, which of the following statistical models would be most appropriate for them to use?
A. Chi-square test
B. Correlation (Correct Answer)
C. Analysis of variance
D. Paired t-test
E. Independent t-test
Explanation: ***Correlation***
- **Correlation** is used to assess the strength and direction of a **linear relationship** between two **continuous variables**.
- In this scenario, researchers would use it to determine if there's a relationship between drug dosage and systolic blood pressure, as both are continuous.
*Chi-square test*
- The **chi-square test** is used to examine the relationship between two **categorical variables**.
- It is not appropriate for understanding linear relationships between continuous variables like drug dosage and blood pressure.
*Analysis of variance*
- **Analysis of variance (ANOVA)** is used to compare the means of **three or more groups** or treatments.
- It identifies if there are statistically significant differences between group means, rather than analyzing the mutual relationship between two continuous variables.
*Paired t-test*
- A **paired t-test** is used to compare the means of **two related groups** or repeated measurements from the same subjects.
- It is often used to assess the effect of an intervention by comparing measurements before and after the intervention, not for observing a relationship between two continuous variables.
*Independent t-test*
- An **independent t-test** compares the means of **two independent groups**.
- This test is not suitable for exploring a mutual or linear relationship between two continuous variables within a single group or dataset.
Question 30: A gastroenterology fellow is interested in the relationship between smoking and incidence of Barrett esophagus. At a departmental grand rounds she recently attended, one of the presenters claimed that smokers are only at increased risk for Barrett esophagus in the presence of acid reflux. She decides to design a retrospective cohort study to investigate the association between smoking and Barrett esophagus. After comparing 400 smokers to 400 non-smokers identified via chart review, she finds that smokers were at increased risk of Barrett esophagus at the end of a 10-year follow-up period (RR = 1.82, p < 0.001). Among patients with a history of acid reflux, there was no relationship between smoking and Barrett esophagus (p = 0.52). Likewise, no relationship was found between smoking and Barrett esophagus among patients without a history of acid reflux (p = 0.48). The results of this study are best explained by which of the following?
A. Random error
B. Matching
C. Effect modification
D. Stratification
E. Confounding (Correct Answer)
Explanation: ***Confounding***
- The initial finding of an increased risk (RR = 1.82) between smoking and Barrett esophagus disappears when the population is **stratified by acid reflux**. This suggests that acid reflux was **confounding** the observed association.
- A confounder is an **extraneous variable** that is related to both the exposure (smoking) and the outcome (Barrett esophagus) but is not part of the causal pathway, thereby distorting the true association.
*Random error*
- Random error leads to **imprecise results** due to natural variability and is unlikely to fully explain the disappearance of a statistically significant association (p < 0.001) after stratification.
- While it can affect the p-values, it typically wouldn't completely nullify a strong original finding across all stratified groups.
*Matching*
- Matching is a technique used in study design (e.g., case-control studies) to **control for confounding** by ensuring similar distribution of confounding variables between groups.
- The problem describes a **retrospective cohort study** where stratification was performed *after* data collection, not matching during the design phase.
*Effect modification*
- Effect modification occurs when the **effect of an exposure on an outcome differs across strata** of another variable. If there were effect modification, we would expect to see varying relationships (e.g., a strong association in one stratum and a weak/absent one in another).
- In this scenario, the association between smoking and Barrett esophagus becomes **non-significant in *both*** reflux and non-reflux strata (p=0.52 and p=0.48), indicating no differential effect but rather the removal of a spurious association.
*Stratification*
- Stratification is a **method of analysis** used to assess for confounding or effect modification by examining the association within subgroups (strata) based on a third variable.
- While stratification was *performed* in the study, it is the *result* (the disappearance of the association) that best explains the phenomenon, indicating **confounding** by acid reflux.