A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?
Q2
A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?
Q3
A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?
Q4
A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?
Q5
An investigator has conducted a prospective study to evaluate the relationship between asthma and the risk of myocardial infarction (MI). She stratifies her analyses by biological sex and observed that among female patients, asthma was a significant predictor of MI risk (hazard ratio = 1.32, p < 0.001). However, among male patients, no relationship was found between asthma and MI risk (p = 0.23). Which of the following best explains the difference observed between male and female patients?
Q6
An investigator studying the effects of dietary salt restriction on atrial fibrillation compares two published studies, A and B. In study A, nursing home patients without atrial fibrillation were randomly assigned to a treatment group receiving a low-salt diet or a control group without dietary salt restriction. When study B began, dietary sodium intake was estimated among elderly outpatients without atrial fibrillation using 24-hour dietary recall. In both studies, patients were reevaluated at the end of one year for atrial fibrillation. Which of the following statements about the two studies is true?
Q7
A doctor is interested in developing a new over-the-counter medication that can decrease the symptomatic interval of upper respiratory infections from viral etiologies. The doctor wants one group of affected patients to receive the new treatment, but he wants another group of affected patients to not be given the treatment. Of the following clinical trial subtypes, which would be most appropriate in comparing the differences in outcome between the two groups?
Q8
A 23-year-old woman presents to her primary care physician because she has been having difficulty seeing despite previously having perfect vision all her life. Specifically, she notes that reading, driving, and recognizing faces has become difficult, and she feels that her vision has become fuzzy. She is worried because both of her older brothers have had visual loss with a similar presentation. Visual exam reveals bilateral loss of central vision with decreased visual acuity and color perception. Pathological examination of this patient's retinas reveals degeneration of retinal ganglion cells bilaterally. She is then referred to a geneticist because she wants to know the probability that her son and daughter will also be affected by this disorder. Her husband's family has no history of this disease. Ignoring the effects of incomplete penetrance, which of the following are the chances that this patient's children will be affected by this disease?
Q9
A group of researchers recently conducted a meta-analysis of twenty clinical trials encompassing 10,000 women with estrogen receptor-positive breast cancer who were disease-free following adjuvant radiotherapy. After an observation period of 15 years, the relationship between tumor grade and distant recurrence of cancer was evaluated. The results show:
Distant recurrence No distant recurrence
Well differentiated 500 4500
Moderately differentiated 375 2125
Poorly differentiated 550 1950
Based on this information, which of the following is the 15-year risk for distant recurrence in patients with high-grade breast cancer?
Q10
A clinical trial is conducted to determine the role of cerebrospinal fluid (CSF) beta-amyloid levels as a biomarker in the early detection and prognosis of Alzheimer disease. A total of 100 participants are enrolled and separated into three groups according to their Mini-Mental State Examination (MMSE) score: mild dementia (20–24 points), moderate dementia (13–20 points), and severe dementia (< 13 points). Participants' CSF level of beta-amyloid 42 is measured using an immunoassay. It is found that participants with severe dementia have a statistically significantly lower mean CSF level of beta-amyloid 42 compared to the other two groups. Which of the following statistical tests was most likely used to compare measurements between the study groups?
Study Design US Medical PG Practice Questions and MCQs
Question 1: A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?
A. Simple random sampling
B. Systematic sampling
C. Stratified random sampling (Correct Answer)
D. Cluster sampling
E. Convenience sampling
Explanation: ***Stratified random sampling***
- This method involves dividing the population into **distinct subgroups (strata)** based on shared characteristics (in this case, humans, animals, and plants), and then performing a simple random sample within each stratum.
- This ensures that all subgroups are proportionally represented in the sample, which is appropriate when studying effects across different biological categories.
*Simple random sampling*
- This method involves selecting individuals from the entire population **purely by chance**, without first dividing them into subgroups.
- It would not guarantee representation from all three distinct groups (humans, animals, and plants), which is essential for studying differential effects.
*Systematic sampling*
- This involves selecting samples at **regular intervals** from an ordered list or sequence.
- This method is not suitable here because the population is divided into distinct, non-ordered groups rather than a continuous sequence.
*Cluster sampling*
- This method involves dividing the population into **clusters**, then randomly selecting some clusters and sampling all individuals within those selected clusters.
- In this scenario, the initial groups (humans, animals, plants) are strata, not clusters, as the intent is to sample from within each group, not to treat the groups themselves as primary sampling units.
*Convenience sampling*
- This is a **non-probability sampling method** where subjects are selected based on ease of access rather than random selection.
- The question explicitly states that random selection is performed from each group, ruling out convenience sampling.
Question 2: A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?
A. Chi-square test (Correct Answer)
B. T-test
C. ANOVA
D. Fisher's exact test
E. Mann-Whitney U test
Explanation: ***Chi-square test***
- The **chi-square test** is appropriate when analyzing the relationship between two **categorical variables**. In this scenario, "diet type" (vegetarian/non-vegetarian) and "presence of disease" (yes/no) are both categorical variables.
- This test determines if there is a statistically significant association between the frequency counts of these two variables in a contingency table.
*T-test*
- A **t-test** is used to compare the **means** of two groups, typically when the dependent variable is continuous.
- This test is unsuitable here because the presence of disease and diet type are categorical, not continuous, variables.
*ANOVA*
- **ANOVA** (Analysis of Variance) is used to compare the **means** of three or more groups, often with a continuous dependent variable.
- Similar to the t-test, ANOVA is not applicable as the study involves categorical variables, not the comparison of means across multiple groups.
*Fisher's exact test*
- **Fisher's exact test** is similar to the chi-square test but specifically used for **small sample sizes** where the expected frequencies in any cell of the contingency table are less than 5.
- While it analyzes categorical data, the chi-square test is the more general and commonly preferred test for larger sample sizes, which is generally assumed unless otherwise specified.
*Mann-Whitney U test*
- The **Mann-Whitney U test** is a non-parametric test used to compare differences between two independent groups when the dependent variable is **ordinal or continuous** but not normally distributed.
- This test is not appropriate for analyzing the association between two categorical variables, as it requires at least one variable to have ranked or continuous data.
Question 3: A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?
A. Paired t-test (Correct Answer)
B. McNemar test
C. Chi-square test
D. Wilcoxon signed-rank test
E. Independent t-test
Explanation: ***Paired t-test***
- A **paired t-test** is appropriate for comparing means from two related samples, such as "before" and "after" measurements on the **same individuals**.
- It assesses whether there is a statistically significant difference between these **dependent observations**.
*Independent t-test*
- The independent t-test compares means between **two separate groups** (unrelated samples).
- It is inappropriate here because we have **paired data** from the same individuals measured twice, not two independent groups.
*McNemar test*
- The McNemar test is used for comparing **paired nominal data**, typically in a 2×2 table, for example, before-after changes in a proportion or categorical outcome.
- It is not suitable for **continuous data** like cholesterol levels.
*Chi-square test*
- The chi-square test is used to assess the association between **two categorical variables** or to compare observed frequencies with expected frequencies.
- It is not designed for comparing means of **continuous variables** in paired samples.
*Wilcoxon signed-rank test*
- The Wilcoxon signed-rank test is a **non-parametric alternative to the paired t-test**, used when the data are not normally distributed or when the sample size is small.
- While it's used for paired data, the paired t-test is generally preferred when parametric assumptions (like **normality**) can be met, especially with a sample size of 80.
Question 4: A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?
A. 4.0
B. 4.5 (Correct Answer)
C. 5.0
D. 5.5
E. 3.5
Explanation: ***4.5***
- The given survival times are already ordered: 2, 3, 4, 4, 5, 6, 7, 8.
- Since there is an **even number of observations (n=8)**, the median is the average of the two middle values, which are the 4th and 5th values. (4 + 5) / 2 = **4.5**.
*3.5*
- This value would result from incorrectly averaging the 3rd and 4th observations (3 + 4) / 2 = 3.5.
- This error occurs when miscounting the middle positions in an even-numbered dataset.
*4.0*
- This value represents the **fourth observation** in the ordered list, not the true median for an even number of data points.
- While it is one of the middle values, the median for an even dataset requires averaging the two middle-most values.
*5.0*
- This value represents the **fifth observation** in the ordered list, not the true median for an even number of data points.
- It would be the median if the dataset contained an odd number of observations and 5 was the middle term.
*5.5*
- This value would be the mean of 5 and 6, which are the 5th and 6th values, not the correct middle values.
- This calculation does not represent the correct methodology for finding the median in this dataset.
Question 5: An investigator has conducted a prospective study to evaluate the relationship between asthma and the risk of myocardial infarction (MI). She stratifies her analyses by biological sex and observed that among female patients, asthma was a significant predictor of MI risk (hazard ratio = 1.32, p < 0.001). However, among male patients, no relationship was found between asthma and MI risk (p = 0.23). Which of the following best explains the difference observed between male and female patients?
A. Effect modification (Correct Answer)
B. Measurement bias
C. Stratified sampling
D. Confounding
E. Random error
Explanation: ***Effect modification***
- **Effect modification** occurs when the relationship between an exposure (asthma) and an outcome (MI) differs across various levels of a third variable (biological sex).
- In this scenario, sex alters the effect of asthma on MI risk, showing a significant relationship in females but not in males, which is the definition of effect modification.
*Measurement bias*
- **Measurement bias** refers to systematic errors in the collection of data, leading to inaccurate assessment of exposure, outcome, or confounders.
- There is no indication in the question that the methods of measuring asthma or MI differed systematically between males and females, or that the measurements themselves were flawed.
*Stratified sampling*
- **Stratified sampling** is a technique used in study design where a population is divided into subgroups (strata) and then samples are randomly selected from each stratum.
- While the analysis was stratified by sex, this choice was made during data analysis to understand differences, not necessarily during the initial sampling process to ensure representation.
*Confounding*
- **Confounding** occurs when a third variable is associated with both the exposure and the outcome, and it distorts the true relationship between them.
- The investigator stratified by sex and found different results, implying that sex is not merely a confounder that needs to be controlled, but rather a variable that modifies the effect.
*Random error*
- **Random error** is unsystematic variation in data that can lead to imprecise measurements or findings due to chance.
- While random error can contribute to non-significant findings, the significant p-value (<0.001) in females and the clear difference in effect between sexes suggest a systematic phenomenon rather than mere random chance.
Question 6: An investigator studying the effects of dietary salt restriction on atrial fibrillation compares two published studies, A and B. In study A, nursing home patients without atrial fibrillation were randomly assigned to a treatment group receiving a low-salt diet or a control group without dietary salt restriction. When study B began, dietary sodium intake was estimated among elderly outpatients without atrial fibrillation using 24-hour dietary recall. In both studies, patients were reevaluated at the end of one year for atrial fibrillation. Which of the following statements about the two studies is true?
A. Study A results can be analyzed using a t-test
B. Study B results can be analyzed using a chi-square test
C. Study A allows for better control of confounding variables (Correct Answer)
D. Study B allows for better control over selection bias
E. Study B is better at inferring causality
Explanation: ***Study A allows for better control of confounding variables***
- **Random assignment** in Study A helps distribute both known and unknown confounding variables equally between the treatment and control groups, thereby minimizing their impact on the observed outcome.
- Unlike Study B, which is observational, Study A's experimental design creates comparable groups, allowing for a more accurate assessment of the direct effect of the intervention.
*Study A results can be analyzed using a t-test*
- A **t-test** is typically used to compare the means of two groups for a **continuous outcome variable**.
- The outcome variable in this study, the presence or absence of **atrial fibrillation**, is a **dichotomous (categorical) variable**, making a t-test inappropriate.
- The correct statistical test would be a **chi-square test** or **Fisher's exact test**.
*Study B results can be analyzed using a chi-square test*
- While technically a **chi-square test** could be used to analyze the association between categorized dietary sodium intake and atrial fibrillation in Study B, this statement is not the **best answer** to the question.
- The question asks which statement is **most characteristically true** when comparing the two studies, and Study A's superior control of confounding variables through randomization is the most defining difference between an RCT and an observational cohort study.
- Additionally, cohort studies typically report **relative risk** or **hazard ratios** rather than simple chi-square associations.
*Study B allows for better control over selection bias*
- Study B is an **observational cohort study** that relies on existing groups of outpatients, making it susceptible to **selection bias** as participants are not randomly assigned.
- The method of recruiting outpatients without randomization can introduce differences between groups that are not accounted for, leading to biased results.
*Study B is better at inferring causality*
- Study B, being an **observational cohort study**, can only identify **associations** between dietary salt intake and atrial fibrillation, not establish a **causal relationship**.
- The lack of **randomization** means that other unmeasured factors might be responsible for any observed association, making causal inference unreliable.
Question 7: A doctor is interested in developing a new over-the-counter medication that can decrease the symptomatic interval of upper respiratory infections from viral etiologies. The doctor wants one group of affected patients to receive the new treatment, but he wants another group of affected patients to not be given the treatment. Of the following clinical trial subtypes, which would be most appropriate in comparing the differences in outcome between the two groups?
A. Randomized controlled trial (Correct Answer)
B. Case-control study
C. Cohort study
D. Historical cohort study
E. Cross-sectional study
Explanation: ***Randomized controlled trial***
- This design is ideal for evaluating the **efficacy of an intervention** (new medication) by randomly assigning participants to either a treatment group or a control group.
- **Randomization minimizes bias** and ensures that any observed differences in outcomes between the groups can be attributed to the intervention.
*Case-control study*
- This study design is retrospective and compares individuals with a **disease (cases)** to individuals without the disease (controls) to identify **risk factors** or exposures.
- It would not be suitable for testing the effectiveness of a new treatment as it starts with outcomes and looks backward at exposures, not forward at intervention effects.
*Cohort study*
- A cohort study observes a group of individuals (a cohort) over time to see who develops a disease or outcome, often starting with individuals exposed and unexposed to a **risk factor**.
- While it tracks outcomes, it usually doesn't involve an active intervention or random assignment, making it less suitable for directly comparing a new treatment's efficacy against a control.
*Historical cohort study*
- This is a type of cohort study that uses **past data or records** to identify the cohort and their exposures, then follows them forward in time using existing data to determine outcomes.
- It would not be appropriate for testing a *new* medication because it relies on historical exposures and outcomes, not a prospective, controlled intervention.
*Cross-sectional study*
- This study measures the **prevalence of a disease or condition** and related factors at a single point in time, essentially taking a "snapshot."
- It cannot establish causality or evaluate the effectiveness of an intervention over time due to its lack of follow-up and inability to determine the temporal sequence of events.
Question 8: A 23-year-old woman presents to her primary care physician because she has been having difficulty seeing despite previously having perfect vision all her life. Specifically, she notes that reading, driving, and recognizing faces has become difficult, and she feels that her vision has become fuzzy. She is worried because both of her older brothers have had visual loss with a similar presentation. Visual exam reveals bilateral loss of central vision with decreased visual acuity and color perception. Pathological examination of this patient's retinas reveals degeneration of retinal ganglion cells bilaterally. She is then referred to a geneticist because she wants to know the probability that her son and daughter will also be affected by this disorder. Her husband's family has no history of this disease. Ignoring the effects of incomplete penetrance, which of the following are the chances that this patient's children will be affected by this disease?
A. Daughter: 50% and son: 50%
B. Daughter: ~0% and son: ~0%
C. Daughter: 25% and son: 25%
D. Daughter: ~0% and son: 50%
E. Daughter: 100% and son 100% (Correct Answer)
Explanation: ***Daughter: 100% and son: 100%***
- This scenario describes **Leber Hereditary Optic Neuropathy (LHON)**, characterized by **bilateral central vision loss** and **degeneration of retinal ganglion cells**, with a maternal inheritance pattern.
- LHON is caused by a **mitochondrial DNA mutation**, meaning the disease is transmitted exclusively from the mother to **all her children, regardless of sex**.
- Since mitochondrial DNA is inherited entirely from the maternal lineage, **100% of offspring will inherit the mutation**.
- The question specifies "ignoring incomplete penetrance," meaning we focus on mutation inheritance rather than symptom development.
*Daughter: 50% and son: 50%*
- This inheritance pattern is characteristic of an **autosomal dominant** trait, where there is a 50% chance of passing the allele to each child.
- This does not fit the described pattern of maternal inheritance where all children inherit the mutation from an affected mother.
*Daughter: ~0% and son: ~0%*
- This would only be true if neither parent was a carrier or affected, or if the disease had a very complex, non-mendelian inheritance with low penetrance.
- Given the mother's affected status and the mitochondrial inheritance pattern, the children will definitely inherit the mutation.
*Daughter: 25% and son: 25%*
- This ratio is typical for an **autosomal recessive** inheritance pattern where both parents are heterozygotes (carriers).
- This does not align with the exclusively maternal transmission observed in LHON.
*Daughter: ~0% and son: 50%*
- This inheritance pattern is typical for an **X-linked recessive** disorder, where daughters of an affected father are unaffected carriers and sons have a 50% chance of being affected if the mother is a carrier.
- This is incorrect because LHON is mitochondrially inherited from the mother to all children, not X-linked.
Question 9: A group of researchers recently conducted a meta-analysis of twenty clinical trials encompassing 10,000 women with estrogen receptor-positive breast cancer who were disease-free following adjuvant radiotherapy. After an observation period of 15 years, the relationship between tumor grade and distant recurrence of cancer was evaluated. The results show:
Distant recurrence No distant recurrence
Well differentiated 500 4500
Moderately differentiated 375 2125
Poorly differentiated 550 1950
Based on this information, which of the following is the 15-year risk for distant recurrence in patients with high-grade breast cancer?
A. 500/5000
B. 1950/8575
C. 550/2500 (Correct Answer)
D. 2500/10000
E. 550/1425
Explanation: ***550/2500***
- The question asks for the 15-year risk for distant recurrence in patients with **high-grade breast cancer**, which corresponds to **poorly differentiated** tumors in the provided data.
- For poorly differentiated tumors, there were 550 cases of distant recurrence out of a total of 550 + 1950 = **2500 patients** (550 with recurrence + 1950 without recurrence). Therefore, the risk is 550/2500.
*500/5000*
- This calculation represents the risk for distant recurrence in **well-differentiated** tumors (500 recurrences out of 500 + 4500 = 5000 total well-differentiated cases), not high-grade (poorly differentiated) tumors.
*1950/8575*
- This calculation incorrectly uses 1950 (number of poorly differentiated patients *without* recurrence) as the numerator. The denominator also appears to be incorrectly calculated or irrelevant to the specific group in question.
*2500/10000*
- This calculation represents the **total number of poorly differentiated patients** (2500) divided by the total number of patients in the study (10000), which is the proportion of patients with poorly differentiated cancer, not the risk of recurrence within that group.
*550/1425*
- This calculation incorrectly uses 1425 as the denominator. The total number of patients with poorly differentiated tumors is 2500 (550 with recurrence + 1950 without recurrence), not 1425.
Question 10: A clinical trial is conducted to determine the role of cerebrospinal fluid (CSF) beta-amyloid levels as a biomarker in the early detection and prognosis of Alzheimer disease. A total of 100 participants are enrolled and separated into three groups according to their Mini-Mental State Examination (MMSE) score: mild dementia (20–24 points), moderate dementia (13–20 points), and severe dementia (< 13 points). Participants' CSF level of beta-amyloid 42 is measured using an immunoassay. It is found that participants with severe dementia have a statistically significantly lower mean CSF level of beta-amyloid 42 compared to the other two groups. Which of the following statistical tests was most likely used to compare measurements between the study groups?
A. Chi-square test
B. Pearson correlation analysis
C. Analysis of variance (Correct Answer)
D. Two-sample t-test
E. Fisher's exact test
Explanation: ***Analysis of variance (ANOVA)***
- This statistical test is used to compare the means of **three or more independent groups**. In this scenario, it would be appropriate for comparing the mean CSF beta-amyloid levels across the mild, moderate, and severe dementia groups.
- ANOVA determines if there is a statistically significant difference between the means of these groups, and if so, post-hoc tests can identify which specific groups differ.
*Chi-square test*
- The chi-square test is used for **categorical data** to determine if there is a significant association between two variables.
- This scenario involves comparing **continuous numerical data** (CSF beta-amyloid levels) across groups, not categorical frequencies.
*Pearson correlation analysis*
- Pearson correlation measures the **linear relationship** and strength of association between **two continuous numerical variables**.
- Here, the goal is to compare means across multiple groups, not to assess the correlation between two continuous variables.
*Fisher's exact test*
- Fisher's exact test is used for analyzing the association between two **categorical variables** in a **2x2 contingency table**, especially with small sample sizes.
- This test is not suitable for comparing the means of a continuous variable across multiple groups.
*Two-sample t-test*
- A two-sample t-test is used to compare the means of **exactly two independent groups**.
- Since this study involves **three distinct groups** (mild, moderate, and severe dementia), a two-sample t-test would be insufficient to analyze all group comparisons simultaneously, requiring multiple t-tests which increases the risk of Type I error.