A 4th grade class in Salem, Massachusetts has 20 students. Due to recent media coverage of the fallacious association between vaccines and autism, none of the students have been immunized against influenza this year. Fortunately, up to this point none of the students has come down with the flu. During the first week of flu season, however, 2 students contract influenza. In the second week, 3 more students contract influenza. And in the third week, 5 more students contract influenza. The other students remained healthy throughout the rest of the flu season. In this class, what was the risk of contracting influenza during the second week of the flu season?
Q12
You are interested in tracking the disease burden of a highly contagious viral disease over a time period of 5 years. The virus appears to be indigenous to rural parts of northern Africa. Which of the following research study designs would be optimal for your analysis?
Q13
A researcher faces the task of calculating the mean height of male students in an undergraduate class containing a total of 2,000 male students and 1,750 female students. The mean height of a sample of male students is computed as 176 cm (69.3 in), with a standard deviation of 7 cm (2.8 in). The researcher now tries to calculate the confidence interval for the mean height of the male students in the undergraduate class. Which additional data will be needed for this calculation?
Q14
You are conducting a systematic review on the effect of a new sulfonylurea for the treatment of type II diabetes. For your systematic review you would like to include 95% confidence intervals for the mean of blood glucose levels in the treatment groups. What further information is necessary to abstract from each of the original papers in order to calculate a 95% confidence interval for each study?
Q15
A study looking to examine the utility of colorectal cancer screening in patients younger than 50 is currently seeking subjects to enroll. A 49-year-old man with a family history of colorectal cancer is very interested in enrolling in the study, due to his own personal concerns about developing cancer. If enrolled in this study, which of the following types of biases will this represent?
Q16
A new antihypertensive medication is studied in 3,000 Caucasian men with coronary heart disease who are over age 65. The results show benefits in terms of improved morbidity and mortality as well as a decreased rate of acute coronary events with minimal side effects. After hearing about this new medication and supporting study at a recent continuing education course, a family physician elects to prescribe this medication to a 39-year-old Hispanic female who presents with primary hypertension. After a one month trial and appropriate adjustments in the dosing, the patient's blood pressure is not well controlled by this medication. Which of the following statistical concepts could explain this patient's poor response to the medication?
Q17
You are trying to design a randomized controlled trial to evaluate the effectiveness of metoprolol in patients with heart failure. In preparing for the statistical analysis, you review some common types of statistical errors. Which of the following is true regarding a type 1 error in a clinical study?
Q18
A data analyst is putting systolic blood pressure values into a spreadsheet for a research study on hypertension during pregnancy. The majority of systolic blood pressure values fall between 130 and 145. For one of the study participants, she accidentally types “1400” instead of “140”. Which of the following statements is most likely to be correct?
Q19
A 34-year-old woman, gravida 1, para 0, at 18 weeks' gestation, comes to the physician for a prenatal visit. She recently read about a genetic disorder that manifests with gait ataxia, kyphoscoliosis, and arrhythmia and is concerned about the possibility of her child inheriting the disease. There is no personal or family history of this disorder. The frequency of unaffected carriers in the general population is 1/100. Assuming the population is in a steady state without selection, what is the probability that her child will develop this disease?
Q20
Study X examined the relationship between coffee consumption and lung cancer. The authors of Study X retrospectively reviewed patients' reported coffee consumption and found that drinking greater than 6 cups of coffee per day was associated with an increased risk of developing lung cancer. However, Study X was criticized by the authors of Study Y. Study Y showed that increased coffee consumption was associated with smoking. What type of bias affected Study X, and what study design is geared to reduce the chance of that bias?
Study Design US Medical PG Practice Questions and MCQs
Question 11: A 4th grade class in Salem, Massachusetts has 20 students. Due to recent media coverage of the fallacious association between vaccines and autism, none of the students have been immunized against influenza this year. Fortunately, up to this point none of the students has come down with the flu. During the first week of flu season, however, 2 students contract influenza. In the second week, 3 more students contract influenza. And in the third week, 5 more students contract influenza. The other students remained healthy throughout the rest of the flu season. In this class, what was the risk of contracting influenza during the second week of the flu season?
A. 0.1
B. 0.5
C. 0.15
D. 0.25
E. 0.17 (Correct Answer)
Explanation: ***0.17***
- To calculate the risk during the second week, we need the number of new cases in that week (3 students) and the number of **at-risk individuals** at the beginning of that week.
- At the start of the second week, 18 students were at risk (20 total - 2 who contracted flu in the first week). Therefore, risk = 3/18 = **0.1666**, which rounds to **0.17**.
- This correctly applies the formula: **Risk = (new cases) / (population at risk at start of period)**.
*0.1*
- This value would imply 2 new cases out of 20 students, or similar miscalculation.
- This does not correctly account for the **dynamically changing population at risk** and uses wrong numerator or denominator.
*0.15*
- This incorrectly uses 20 as the denominator (3/20 = 0.15), failing to exclude the 2 students who already had influenza.
- The **population at risk must exclude those already diseased** at the start of the time period.
*0.25*
- This fraction could represent 5 new cases out of 20 total students, or 3 new cases out of 12 students.
- This answer does not reflect the **specific incidence** during the second week with the correct denominator.
*0.5*
- This would mean half of the population contracted influenza, which is significantly higher than the observed 3 new cases in the second week.
- This value is a gross **overestimation of the actual risk** during the specified period.
Question 12: You are interested in tracking the disease burden of a highly contagious viral disease over a time period of 5 years. The virus appears to be indigenous to rural parts of northern Africa. Which of the following research study designs would be optimal for your analysis?
A. Case series
B. Case-control
C. Cohort study (Correct Answer)
D. Cross-sectional
E. Randomized controlled trial
Explanation: ***Cohort study***
- A **cohort study** allows for tracking disease incidence and progression over a defined period (5 years) in a group of individuals (cohort) exposed to conditions in rural Northern Africa, making it optimal for assessing disease burden over time.
- This design is ideal for investigating the natural history of a disease and identifying risk factors within a specific population.
*Case series*
- A **case series** describes characteristics of a group of patients with a particular disease and is useful for hypothesis generation rather than tracking disease burden over time.
- It lacks a comparison group, making it unsuitable for assessing incidence or prevalence in a population.
*Case-control*
- A **case-control study** compares individuals with a disease (cases) to individuals without the disease (controls) and looks retrospectively for exposure differences to identify risk factors.
- This design is efficient for rare diseases but less suitable for tracking overall disease burden or incidence trends over a long period.
*Cross-sectional*
- A **cross-sectional study** measures the prevalence of disease and exposure at a single point in time, providing a snapshot of the population.
- While useful for prevalence, it cannot establish temporality or track changes in disease burden over a 5-year period.
*Randomized controlled trial*
- A **randomized controlled trial (RCT)** is designed to evaluate the effectiveness of an intervention by randomly assigning participants to treatment or control groups.
- This design is unethical and impractical for tracking the natural disease burden of an indigenous viral disease in a population.
Question 13: A researcher faces the task of calculating the mean height of male students in an undergraduate class containing a total of 2,000 male students and 1,750 female students. The mean height of a sample of male students is computed as 176 cm (69.3 in), with a standard deviation of 7 cm (2.8 in). The researcher now tries to calculate the confidence interval for the mean height of the male students in the undergraduate class. Which additional data will be needed for this calculation?
A. The mean height of all the male students in the undergraduate class
B. The given data are adequate, and no more data are needed.
C. Total sample size of the study (Correct Answer)
D. Total number of male students in the undergraduate class who did not take part in the study
E. A sampling frame of all of the male students in the undergraduate class
Explanation: ***Total sample size of the study***
- To calculate the **confidence interval**, one needs the **sample mean**, **standard deviation**, and critically, the **sample size (n)**.
- The sample size is crucial because it influences the **standard error of the mean** and thus the width of the confidence interval.
*The mean height of all the male students in the undergraduate class*
- This value represents the **true population mean**, which is precisely what the confidence interval is trying to **estimate**.
- If this value were known, there would be no need to calculate a confidence interval for it.
*The given data are adequate, and no more data are needed.*
- While the **sample mean** and **standard deviation** are provided, the problem statement does not explicitly state the **sample size (n)** of male students from which these statistics were derived.
- The number of all male students (2,000) is the **population size**, not the sample size used for the calculation.
*Total number of male students in the undergraduate class who did not take part in the study*
- This information is not directly used in the calculation of a **confidence interval** for the mean.
- It relates to the part of the population that was not sampled, which doesn't impact the formula for the confidence interval itself.
*A sampling frame of all of the male students in the undergraduate class*
- A **sampling frame** is a list of all individuals in the population from which a sample can be drawn; it's essential for the **sampling process** itself.
- However, once the sample mean and standard deviation are obtained, the sampling frame is not directly needed for the *calculation* of the confidence interval.
Question 14: You are conducting a systematic review on the effect of a new sulfonylurea for the treatment of type II diabetes. For your systematic review you would like to include 95% confidence intervals for the mean of blood glucose levels in the treatment groups. What further information is necessary to abstract from each of the original papers in order to calculate a 95% confidence interval for each study?
A. Power, standard deviation, mean
B. Power, mean, sample size
C. Standard deviation, mean, sample size (Correct Answer)
D. Standard deviation, mean, sample size, power
E. Power, standard deviation, sample size
Explanation: ***Standard deviation, mean, sample size***
- To calculate a **95% confidence interval** for the mean, you need the **sample mean**, the **standard deviation** (which quantifies data variability), and the **sample size** (the number of observations).
- The formula for a confidence interval for the mean involves these three components and a z-score or t-score corresponding to the desired confidence level.
*Power, standard deviation, mean*
- **Power** is related to the probability of correctly rejecting a false null hypothesis and is not directly used in the calculation of a confidence interval for a single mean.
- While **standard deviation** and **mean** are necessary, **sample size** is also crucial for the calculation, which is missing from this option.
*Power, mean, sample size*
- **Power** is a concept relevant to study design and hypothesis testing, not for calculating a confidence interval for an observed mean.
- While **mean** and **sample size** are correctly identified, the **standard deviation** is a critical missing component needed to quantify the variability around the mean.
*Standard deviation, mean, sample size, power*
- While **standard deviation**, **mean**, and **sample size** are all needed for calculating the confidence interval, **power** is not required for this specific calculation.
- Including **power** as a necessary piece of information is incorrect because it relates to the study's ability to detect an effect, not the precision of an estimated mean.
*Power, standard deviation, sample size*
- This option incorrectly includes **power**, which is not needed for calculating a confidence interval for the mean.
- It also omits the **mean** itself, which is a fundamental component of the confidence interval formula as the central estimate.
Question 15: A study looking to examine the utility of colorectal cancer screening in patients younger than 50 is currently seeking subjects to enroll. A 49-year-old man with a family history of colorectal cancer is very interested in enrolling in the study, due to his own personal concerns about developing cancer. If enrolled in this study, which of the following types of biases will this represent?
A. Recall bias
B. Measurement bias
C. Selection bias (Correct Answer)
D. Length bias
E. Lead-time bias
Explanation: ***Selection bias***
- This scenario exemplifies **selection bias** because the individual actively seeks to participate in the study due to personal concerns and a **family history of colorectal cancer**. This means the study participants may not be representative of the general population younger than 50, potentially skewing the results to show a higher prevalence or different screening utility than would be found in a genuinely random sample.
- **Selection bias** occurs when the selection of subjects for a study (or their retention in the study) results in a sample that is not truly representative of the target population.
*Recall bias*
- **Recall bias** occurs when subjects with a particular condition (e.g., CRC) are more likely to remember exposures or risk factors than healthy controls.
- This bias is typically a problem in **retrospective studies** where subjects are asked to recall past events.
*Measurement bias*
- **Measurement bias** arises from flaws in the way data is collected or measured, leading to systematically inaccurate results.
- Examples include using **faulty equipment** or inconsistent methods for assessing outcomes or exposures, leading to misclassification.
*Length bias*
- **Length bias** in screening refers to the fact that screening tests are more likely to detect cases of disease that are **slower-growing** and have a longer preclinical phase.
- This can make screened populations appear to have a better prognosis, as the more aggressive, fast-growing cases are often missed between screening intervals.
*Lead-time bias*
- **Lead-time bias** refers to the apparent increase in survival time among screened individuals due to the **earlier detection of disease** by screening, rather than an actual prolongation of life.
- It occurs when the time from diagnosis to death is artificially lengthened because the disease was found earlier, even if the actual date of death remains unchanged.
Question 16: A new antihypertensive medication is studied in 3,000 Caucasian men with coronary heart disease who are over age 65. The results show benefits in terms of improved morbidity and mortality as well as a decreased rate of acute coronary events with minimal side effects. After hearing about this new medication and supporting study at a recent continuing education course, a family physician elects to prescribe this medication to a 39-year-old Hispanic female who presents with primary hypertension. After a one month trial and appropriate adjustments in the dosing, the patient's blood pressure is not well controlled by this medication. Which of the following statistical concepts could explain this patient's poor response to the medication?
A. Effect modification
B. Observer bias
C. Selection bias
D. Confounding
E. Generalizability (Correct Answer)
Explanation: ***Generalizability***
- The study population was very specific (**Caucasian men over 65 with coronary heart disease**), making it difficult to **generalize** the findings to a 39-year-old Hispanic female with primary hypertension.
- **External validity** is limited when study results from one population are applied to a different population with distinct demographic and clinical characteristics.
- The medication's efficacy might vary significantly across different **demographic groups** (age, sex, ethnicity) and clinical presentations not represented in the original study.
*Effect modification*
- **Effect modification** (also called interaction) occurs when the magnitude of a treatment effect differs across subgroups *within a study* that included those subgroups.
- The original study only enrolled elderly Caucasian men, so it couldn't assess whether the drug works differently in women, younger patients, or other ethnicities.
- The poor response here reflects applying results **beyond the study population** (a generalizability issue), not effect modification identified *within* the study data.
*Observer bias*
- **Observer bias** occurs when the researcher's expectations or preconceptions influence the observation or measurement of outcomes, leading to systematic errors.
- This is not relevant here as the patient's poor response is an objective clinical outcome (uncontrolled blood pressure), not an observation influenced by the physician's expectations.
*Selection bias*
- **Selection bias** occurs when the way participants are chosen for a study leads to a sample that is not representative of the target population, or when comparison groups are not comparable.
- This concept describes flaws in the *original study design* or participant recruitment, not the applicability of valid study results to a *different, external patient*.
*Confounding*
- **Confounding** occurs when an unmeasured variable is associated with both the exposure and the outcome, distorting the true relationship between them.
- This is a problem *within* a study designed to establish causality, not an explanation for why a medication effective in one population might not work in another due to inherent differences in patient characteristics.
Question 17: You are trying to design a randomized controlled trial to evaluate the effectiveness of metoprolol in patients with heart failure. In preparing for the statistical analysis, you review some common types of statistical errors. Which of the following is true regarding a type 1 error in a clinical study?
A. A type 1 error is a beta (β) error and is usually 0.1 or 0.2.
B. A type 1 error occurs when the null hypothesis is true but is rejected in error. (Correct Answer)
C. A type 1 error is dependent on the confidence interval of a study.
D. A type 1 error means the study is not significantly powered to detect a true difference between study groups.
E. A type 1 error occurs when the null hypothesis is false, yet is accepted in error.
Explanation: **A type 1 error occurs when the null hypothesis is true but is rejected in error.**
- A **Type I error**, also known as an **alpha (α) error**, occurs when a study concludes there is a significant effect or difference when, in reality, there isn't one. The **null hypothesis (H0)**, which states there is no effect or no difference, is **incorrectly rejected**.
- This error represents a **false positive** result, meaning the researchers incorrectly found a treatment to be effective when it is not. The probability of making a Type I error is set by the **significance level (α)**, typically 0.05.
*A type 1 error is a beta (β) error and is usually 0.1 or 0.2.*
- A **Type 1 error** is denoted by **alpha (α)**, not beta (β).
- **Beta (β)** represents the probability of a **Type II error**, where the null hypothesis is *mistakenly accepted* when it is false.
*A type 1 error is dependent on the confidence interval of a study.*
- The **confidence interval** and the **significance level (α)** (which determines Type I error) are related but the error itself does not *depend* on the confidence interval.
- A 95% confidence interval corresponds to an alpha of 0.05, meaning if the null value falls outside this interval, the null hypothesis is rejected at the 0.05 significance level.
*A type 1 error means the study is not significantly powered to detect a true difference between study groups.*
- This statement describes a **Type II error (β error)**, not a Type I error.
- **Statistical power** is the probability of correctly rejecting a false null hypothesis (1 - β). Low power increases the risk of a Type II error.
*A type 1 error occurs when the null hypothesis is false, yet is accepted in error.*
- This describes a **Type II error (β error)**.
- In a **Type II error**, a study fails to detect a true effect or difference, leading to a **false negative** conclusion.
Question 18: A data analyst is putting systolic blood pressure values into a spreadsheet for a research study on hypertension during pregnancy. The majority of systolic blood pressure values fall between 130 and 145. For one of the study participants, she accidentally types “1400” instead of “140”. Which of the following statements is most likely to be correct?
A. The mode is now greater than the mean
B. The median is now smaller than the mean (Correct Answer)
C. The range of the data set is unaffected
D. This is a systematic error
E. The standard deviation of the data set is decreased
Explanation: ***The median is now smaller than the mean***
- A single, exceptionally high value like 1400 (**outlier**) will **inflate the mean** significantly, as the mean is sensitive to extreme values.
- The median, being the middle value in a sorted dataset, is **resistant to outliers** and will remain relatively unchanged, thus becoming smaller relative to the inflated mean.
*The mode is now greater than the mean*
- The **mode** is the most frequently occurring value, which would still be in the 130-145 range, and is unlikely to be greater than the heavily inflated mean.
- While the mean is significantly increased by the outlier, the mode is driven by the majority of data points and is thus largely unaffected, making it highly improbable to exceed the inflated mean.
*The range of the data set is unaffected*
- The **range** is the difference between the maximum and minimum values. Replacing 140 with 1400 would dramatically *increase* the maximum value, thereby significantly **increasing the range** of the data set.
- The incorrect entry of '1400' creates a new maximum value, directly altering the range.
*This is a systematic error*
- A **systematic error** is a consistent, repeatable error that biases measurements in a predictable way (e.g., a consistently miscalibrated instrument).
- Typing "1400" instead of "140" is a **random transcription error** or a **gross error**, not a consistent bias and therefore is not a systematic error.
*The standard deviation of the data set is decreased*
- **Standard deviation** measures the spread or dispersion of data points. An extremely high outlier like 1400 will significantly **increase the variability** within the dataset.
- This increased variability, due to one data point being very far from the mean, will lead to a substantial *increase* in the standard deviation, not a decrease.
Question 19: A 34-year-old woman, gravida 1, para 0, at 18 weeks' gestation, comes to the physician for a prenatal visit. She recently read about a genetic disorder that manifests with gait ataxia, kyphoscoliosis, and arrhythmia and is concerned about the possibility of her child inheriting the disease. There is no personal or family history of this disorder. The frequency of unaffected carriers in the general population is 1/100. Assuming the population is in a steady state without selection, what is the probability that her child will develop this disease?
A. 1/10,000
B. 1/20,000
C. 1/40,000 (Correct Answer)
D. 1/200
E. 1/400
Explanation: ***1/40,000***
- This disorder (Friedreich's ataxia) follows **autosomal recessive** inheritance, meaning both parents must be carriers for the child to be affected.
- Since there is no family history, we treat both parents as random individuals from the general population with carrier frequency 1/100.
- **Calculation**: Probability mother is carrier (1/100) × Probability father is carrier (1/100) × Probability child is affected given both parents are carriers (1/4) = **1/40,000**.
- This applies Hardy-Weinberg equilibrium principles for a steady-state population.
*1/10,000*
- This calculation (1/100 × 1/100 = 1/10,000) represents only the probability that both parents are carriers.
- It fails to account for the **1/4 chance** of an affected child when two carriers of an **autosomal recessive** condition conceive.
- This would be the answer if both parents being carriers automatically meant the child would be affected, which is incorrect.
*1/20,000*
- This result would occur if the probability of the child inheriting the disease from carrier parents was 1/2 instead of 1/4 (1/100 × 1/100 × 1/2 = 1/20,000).
- A 1/2 probability would apply to **autosomal dominant** conditions where one affected parent passes the disease, not for **autosomal recessive** inheritance.
- For autosomal recessive disorders, two carrier parents have a 1/4 (not 1/2) chance of an affected child.
*1/200*
- This probability (1/100 × 1/2 = 1/200) would suggest only one parent needed to be a carrier with a 1/2 transmission probability.
- This does not account for the requirement that **both parents must be carriers** for an **autosomal recessive** disorder.
- It represents a fundamental misunderstanding of recessive inheritance patterns.
*1/400*
- This calculation (1/100 × 1/4 = 1/400) incorrectly assumes only one parent needs to be a carrier.
- For **autosomal recessive** inheritance, **both parents must be carriers**, so both their carrier probabilities (1/100 each) must be included in the calculation.
- It omits the second parent's carrier probability entirely.
Question 20: Study X examined the relationship between coffee consumption and lung cancer. The authors of Study X retrospectively reviewed patients' reported coffee consumption and found that drinking greater than 6 cups of coffee per day was associated with an increased risk of developing lung cancer. However, Study X was criticized by the authors of Study Y. Study Y showed that increased coffee consumption was associated with smoking. What type of bias affected Study X, and what study design is geared to reduce the chance of that bias?
A. Observer bias; double blind analysis
B. Selection bias; randomization
C. Lead time bias; placebo
D. Measurement bias; blinding
E. Confounding; randomization (Correct Answer)
Explanation: ***Confounding; randomization***
- Study Y suggests that **smoking** is a **confounding variable** because it is associated with both increased coffee consumption (exposure) and increased risk of lung cancer (outcome), distorting the apparent relationship between coffee and lung cancer.
- **Randomization** in experimental studies (such as randomized controlled trials) helps reduce confounding by ensuring that known and unknown confounding factors are evenly distributed among study groups.
- In observational studies where randomization is not possible, confounding can be addressed through **stratification**, **matching**, or **multivariable adjustment** during analysis.
*Observer bias; double blind analysis*
- **Observer bias** occurs when researchers' beliefs or expectations influence the study outcome, which is not the primary issue described here regarding the relationship between coffee, smoking, and lung cancer.
- **Double-blind analysis** is a method to mitigate observer bias by ensuring neither participants nor researchers know who is in the control or experimental groups.
*Selection bias; randomization*
- **Selection bias** happens when the study population is not representative of the target population, leading to inaccurate results, which is not directly indicated by the interaction between coffee and smoking.
- While **randomization** is used to reduce selection bias by creating comparable groups, the core problem identified in Study X is confounding, not flawed participant selection.
*Lead time bias; placebo*
- **Lead time bias** occurs in screening programs when early detection without improved outcomes makes survival appear longer, an issue unrelated to the described association between coffee, smoking, and lung cancer.
- A **placebo** is an inactive treatment used in clinical trials to control for psychological effects, and its relevance here is limited to treatment intervention studies.
*Measurement bias; blinding*
- **Measurement bias** arises from systematic errors in data collection, such as inaccurate patient reporting of coffee consumption, but the main criticism from Study Y points to a third variable (smoking) affecting the association, not just flawed measurement.
- **Blinding** helps reduce measurement bias by preventing participants or researchers from knowing group assignments, thus minimizing conscious or unconscious influences on data collection.