A study is performed to determine the prevalence of a particular rare fungal pneumonia. A sample population of 100 subjects is monitored for 4 months. Every month, the entire population is screened and the number of new cases is recorded for the group. The data from the study are given in the table below:
Time point New cases of fungal pneumonia
t = 0 months 10
t = 1 months 4
t = 2 months 2
t = 3 months 5
t = 4 months 4
Which of the following is correct regarding the prevalence of this rare fungal pneumonia in this sample population?
Q102
A 25-year-old man with a genetic disorder presents for genetic counseling because he is concerned about the risk that any children he has will have the same disease as himself. Specifically, since childhood he has had difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy. He has also had diarrhea and malabsorption requiring enzyme replacement therapy. If his wife comes from a population where 1 in 10,000 people are affected by this same disorder, which of the following best represents the likelihood a child would be affected as well?
Q103
A study on cholesterol levels is performed. There are 1000 participants. It is determined that in this population, the mean LDL is 200 mg/dL with a standard deviation of 50 mg/dL. If the population has a normal distribution, how many people have a cholesterol less than 300 mg/dL?
Q104
You are reviewing the protocol for a retrospective case-control study investigating risk factors for mesothelioma among retired factory workers. 100 cases of mesothelioma and 100 age and sex matched controls are to be recruited and interviewed about their exposure to industrial grade fiberglass by blinded interviewers. The investigators' primary hypothesis is that cases of mesothelioma will be more likely to have been exposed to industrial grade fiberglass. The design of this study is most concerning for which type of bias?
Q105
A surgeon is interested in studying how different surgical techniques impact the healing of tendon injuries. In particular, he will compare 3 different types of suture repairs biomechanically in order to determine the maximum load before failure of the tendon 2 weeks after repair. He collects data on maximum load for 90 different repaired tendons from an animal model. Thirty tendons were repaired using each of the different suture techniques. Which of the following statistical measures is most appropriate for analyzing the results of this study?
Q106
A group of researchers studying the relationship between major depressive disorder and unprovoked seizures identified 36 patients via chart review who had been rehospitalized for unprovoked seizures following discharge from an inpatient psychiatric unit and 105 patients recently discharged from the same unit who did not experience unprovoked seizures. The results of the study show:
Unprovoked seizure No seizure
Major depressive disorder 20 35
No major depressive disorder 16 70
Based on this information, which of the following is the most appropriate measure of association between history of major depressive disorder (MDD) and unprovoked seizures?
Q107
A group of researchers is looking to study the effect of body weight on blood pressure in the elderly. Previous work measuring body weight and blood pressure at 2-time points in a large group of healthy individuals revealed that a 10% increase in body weight was accompanied by a 7 mm Hg increase in blood pressure. If the researchers want to determine if there is a linear relationship between body weight and blood pressure in a subgroup of elderly individuals in this study, which of the following statistical methods would best be employed to answer this question?
Q108
A case-control study with a focus on risk factors that may influence the development of depression was conducted among the elderly population in one tertiary hospital in Malaysia. The study involved 150 elderly patients diagnosed with depressive illness from the psychiatry ward, as well as another group of 150 elderly patients without any history of depressive illness (but hospitalized for other reasons) at the same ward. The data were collected through questionnaires, and 2 principal investigators (who were also the patients’ attending physicians) acted as interviewers after proper training for the purposes of this study. Multivariate analyses of logistic regression with independent variables were employed to determine the adjusted odds ratio for the risk of developing depression. The study results showed that a lower level of social support, lack of education, and the presence of chronic illnesses highly correlated with depression. In order to maximally avoid bias that may stem from this kind of study design, what should the researchers have done differently to increase the validity of their results?
Q109
A recent study examined trends in incidence and fatality of ischemic stroke in a representative sample of Scandinavian towns. The annual incidence of ischemic stroke was calculated to be 60 per 2,000 people. The 1-year case fatality rate for ischemic stroke was found to be 20%. The health department of a town in southern Sweden with a population of 20,000 is interested in knowing the 1-year mortality conferred by ischemic stroke. Based on the study's findings, which of the following estimates the annual mortality rate for ischemic stroke per 20,000?
Q110
A newlywed couple comes to your office for genetic counseling. Both potential parents are known to be carriers of the same Cystic Fibrosis (CF) mutation. What is the probability that at least one of their next three children will have CF if they are all single births?
Study Design US Medical PG Practice Questions and MCQs
Question 101: A study is performed to determine the prevalence of a particular rare fungal pneumonia. A sample population of 100 subjects is monitored for 4 months. Every month, the entire population is screened and the number of new cases is recorded for the group. The data from the study are given in the table below:
Time point New cases of fungal pneumonia
t = 0 months 10
t = 1 months 4
t = 2 months 2
t = 3 months 5
t = 4 months 4
Which of the following is correct regarding the prevalence of this rare fungal pneumonia in this sample population?
A. The prevalence at time point 2 months is 2%.
B. The prevalence at time point 3 months is 11%.
C. The prevalence at the conclusion of the study is 15%.
D. The prevalence and the incidence at time point 2 months are equal.
E. The prevalence at the conclusion of the study is 25%. (Correct Answer)
Explanation: ***The prevalence at the conclusion of the study is 25%***
- Prevalence is calculated by dividing the **total number of existing cases** by the total population at a specific point in time. At the conclusion of the study (t=4 months), the cumulative number of new cases is 10 + 4 + 2 + 5 + 4 = 25.
- The prevalence is therefore 25 cases / 100 subjects = **25%**.
*The prevalence at time point 2 months is 2%*
- At time point 2 months, the **cumulative number of new cases** is 10 (at t=0) + 4 (at t=1) + 2 (at t=2) = 16 cases.
- The prevalence at 2 months would be 16 cases / 100 subjects = **16%**, not 2%.
*The prevalence at time point 3 months is 11%*
- The cumulative number of new cases at time point 3 months is 10 (at t=0) + 4 (at t=1) + 2 (at t=2) + 5 (at t=3) = 21 cases.
- The prevalence at 3 months would be 21 cases / 100 subjects = **21%**, not 11%.
*The prevalence at the conclusion of the study is 15%*
- The cumulative number of new cases at the conclusion of the study (t=4 months) is 10 + 4 + 2 + 5 + 4 = **25 cases**.
- Therefore, the prevalence is 25 cases / 100 subjects = **25%**, not 15%.
*The prevalence and the incidence at time point 2 months are equal*
- **Incidence** refers to the number of *new* cases within a specified period, which at t=2 months is 2 cases.
- **Prevalence** at t=2 months is the cumulative number of cases (10+4+2 = 16 cases), so incidence (2%) and prevalence (16%) are **not equal**.
Question 102: A 25-year-old man with a genetic disorder presents for genetic counseling because he is concerned about the risk that any children he has will have the same disease as himself. Specifically, since childhood he has had difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy. He has also had diarrhea and malabsorption requiring enzyme replacement therapy. If his wife comes from a population where 1 in 10,000 people are affected by this same disorder, which of the following best represents the likelihood a child would be affected as well?
A. 0.01%
B. 2%
C. 0.5%
D. 1% (Correct Answer)
E. 50%
Explanation: ***Correct Option: 1%***
- The patient's symptoms (difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy; diarrhea and malabsorption requiring enzyme replacement therapy) are classic for **cystic fibrosis (CF)**, an **autosomal recessive disorder**.
- For an autosomal recessive disorder with a prevalence of 1 in 10,000 in the general population, **q² = 1/10,000**, so **q = 1/100 = 0.01**. The carrier frequency **(2pq)** is approximately **2q = 2 × (1/100) = 1/50 = 0.02**.
- The affected man is **homozygous recessive (aa)** and will always pass on the recessive allele. His wife has a **1/50 chance of being a carrier (Aa)**. If she is a carrier, she has a **1/2 chance of passing on the recessive allele**.
- Therefore, the probability of an affected child = **(Probability wife is a carrier) × (Probability wife passes recessive allele) = 1/50 × 1/2 = 1/100 = 1%**.
*Incorrect Option: 0.01%*
- This percentage is too low and does not correctly account for the carrier frequency in the population and the probability of transmission from a carrier mother.
*Incorrect Option: 2%*
- This represents approximately the carrier frequency (1/50 ≈ 2%), but does not account for the additional 1/2 probability that a carrier mother would pass on the recessive allele.
*Incorrect Option: 0.5%*
- This value would be correct if the carrier frequency were 1/100 instead of 1/50, which does not match the given population prevalence.
*Incorrect Option: 50%*
- **50%** would be the risk if both parents were carriers of an autosomal recessive disorder (1/4 chance = 25% for affected, but if we know one parent passes the allele, conditional probability changes). More accurately, 50% would apply if the disorder were **autosomal dominant** with one affected parent, which is not the case here.
Question 103: A study on cholesterol levels is performed. There are 1000 participants. It is determined that in this population, the mean LDL is 200 mg/dL with a standard deviation of 50 mg/dL. If the population has a normal distribution, how many people have a cholesterol less than 300 mg/dL?
A. 975 (Correct Answer)
B. 950
C. 680
D. 997
E. 840
Explanation: ***975***
- This value corresponds to **two standard deviations** above the mean in a normal distribution, as per the **empirical rule (68-95-99.7 rule)**.
- With a mean of 200 mg/dL and a standard deviation of 50 mg/dL, 300 mg/dL is (300-200)/50 = 2 standard deviations above the mean. Approximately 97.5% of data falls below +2 standard deviations in a normal distribution. Therefore, 0.975 × 1000 = 975 people.
*950*
- This number would correspond to 95% of the population, which is the percentage within **±1.96 standard deviations** from the mean in a two-tailed distribution (used for 95% confidence intervals).
- However, the question asks for values *less than* 300 mg/dL (a one-tailed scenario at exactly +2 SD), which is 97.5%, not 95%.
*680*
- This represents the percentage of data (68%) that falls within **one standard deviation (±1 SD)** of the mean in a normal distribution.
- In this scenario, one standard deviation above the mean is 250 mg/dL, not 300 mg/dL. This option incorrectly applies the 68% rule.
*997*
- This corresponds to the percentage of data (99.7%) that falls within **three standard deviations (±3 SD)** of the mean in a normal distribution.
- Three standard deviations *above* the mean would be 350 mg/dL (200 + 3×50), which is beyond the target value of 300 mg/dL. The question asks about 300 mg/dL, which is only 2 SD above the mean.
*840*
- This number represents the percentage of data that falls below **one standard deviation (1 SD)** above the mean in a normal distribution.
- Using the empirical rule: 50% (below mean) + 34% (between mean and +1 SD) = 84%. Thus, 0.84 × 1000 = 840 people. However, 300 mg/dL is two standard deviations above the mean (250 mg/dL = +1 SD), not one.
Question 104: You are reviewing the protocol for a retrospective case-control study investigating risk factors for mesothelioma among retired factory workers. 100 cases of mesothelioma and 100 age and sex matched controls are to be recruited and interviewed about their exposure to industrial grade fiberglass by blinded interviewers. The investigators' primary hypothesis is that cases of mesothelioma will be more likely to have been exposed to industrial grade fiberglass. The design of this study is most concerning for which type of bias?
A. This study design is free of potential bias
B. Observer bias
C. Interviewer bias
D. Lead-time bias
E. Recall bias (Correct Answer)
Explanation: ***Recall bias***
- In a retrospective **case-control study**, individuals with mesothelioma (cases) may be more likely to **recall and report past exposures** to industrial-grade fiberglass than controls, due to their diagnosis and their search for an explanation for their illness.
- This differential recall of past exposures between cases and controls can distort the true association between the exposure and the disease, leading to a biased estimate of risk.
- Cases do not necessarily remember more accurately; rather, they may over-report or selectively remember exposures they believe might be causally related to their disease.
*This study design is free of potential bias*
- This statement is incorrect because **no study design is completely free of potential biases**, especially in observational studies like this case-control design.
- While efforts like blinded interviewers are made, inherent limitations of retrospective data collection can introduce other forms of bias.
*Observer bias*
- **Observer bias** typically refers to situations where the researcher's expectations or beliefs influence the recording of data, but the study description states **blinded interviewers** are used, which aims to mitigate this type of bias.
- This bias is less likely here due to the blinding, and the primary concern relates to the participants' memory of past events.
*Interviewer bias*
- **Interviewer bias** can occur when the interviewer's behavior or questioning influences the participant's responses.
- However, the protocol mitigates this by using **blinded interviewers**, meaning they are unaware of the case/control status of the participants, reducing the risk of differential questioning.
*Lead-time bias*
- **Lead-time bias** is primarily a concern in screening studies where early detection of a disease might artificially prolong the survival time without actually changing the course of the disease.
- This study is investigating risk factors for mesothelioma, not evaluating the effectiveness of a screening program, rendering lead-time bias irrelevant to this design.
Question 105: A surgeon is interested in studying how different surgical techniques impact the healing of tendon injuries. In particular, he will compare 3 different types of suture repairs biomechanically in order to determine the maximum load before failure of the tendon 2 weeks after repair. He collects data on maximum load for 90 different repaired tendons from an animal model. Thirty tendons were repaired using each of the different suture techniques. Which of the following statistical measures is most appropriate for analyzing the results of this study?
A. Chi-squared
B. Wilcoxon rank sum
C. Pearson r coefficient
D. Student t-test
E. ANOVA (Correct Answer)
Explanation: ***ANOVA***
- **ANOVA (Analysis of Variance)** is appropriate here because it compares the means of **three or more independent groups** (the three different suture techniques) on a continuous dependent variable (maximum load before failure).
- The study has three distinct repair techniques, each with 30 tendons, making ANOVA suitable for determining if there are statistically significant differences among their mean failure loads.
*Chi-squared*
- The **Chi-squared test** is used for analyzing **categorical data** (frequencies or proportions) to determine if there is an association between two nominal variables.
- This study involves quantitative measurement (maximum load), not categorical data, making Chi-squared inappropriate.
*Wilcoxon rank sum*
- The **Wilcoxon rank sum test** (also known as Mann-Whitney U test) is a **non-parametric test** used to compare two independent groups when the data is not normally distributed or is ordinal.
- While the study has independent groups, it involves three groups, and the dependent variable is continuous, making ANOVA a more powerful and appropriate choice assuming normal distribution.
*Pearson r coefficient*
- The **Pearson r coefficient** measures the **strength and direction of a linear relationship between two continuous variables**.
- This study aims to compare means across different groups, not to determine the correlation between two continuous variables.
*Student t-test*
- The **Student t-test** is used to compare the means of **exactly two groups** (either independent or paired) on a continuous dependent variable.
- This study involves comparing three different suture techniques, not just two, making the t-test unsuitable.
Question 106: A group of researchers studying the relationship between major depressive disorder and unprovoked seizures identified 36 patients via chart review who had been rehospitalized for unprovoked seizures following discharge from an inpatient psychiatric unit and 105 patients recently discharged from the same unit who did not experience unprovoked seizures. The results of the study show:
Unprovoked seizure No seizure
Major depressive disorder 20 35
No major depressive disorder 16 70
Based on this information, which of the following is the most appropriate measure of association between history of major depressive disorder (MDD) and unprovoked seizures?
A. 1.95
B. 2.5 (Correct Answer)
C. 0.19
D. 0.36
E. 0.17
Explanation: ***2.5***
- This is a **case-control study** because it starts with individuals who have the outcome (unprovoked seizures) and individuals who do not, then looks back at their exposure (major depressive disorder).
- For a case-control study, the appropriate measure of association is the **odds ratio (OR)**, calculated as (a/c) / (b/d) = (ad) / (bc). In this case: a = 20 (MDD with seizure), b = 35 (MDD without seizure), c = 16 (no MDD with seizure), d = 70 (no MDD without seizure). So, OR = (20 * 70) / (35 * 16) = 1400 / 560 = 2.5.
*1.95*
- This value might be a calculation error or represent a different measure of association not applicable to this study design.
- The correct calculation for the odds ratio leads to 2.5.
*0.19*
- This value is likely a **relative risk** or **risk ratio**, which is used in cohort studies where risk is directly measured.
- In a case-control study, the **incidence of the outcome** cannot be directly determined, making relative risk an inappropriate measure.
*0.36*
- This value is not derived from the appropriate statistical calculation for the odds ratio in a case-control study.
- It might represent a **proportion** or a different type of risk calculation.
*0.17*
- This value is not the correct measure of association for a case-control study.
- It could be a miscalculation of a **prevalence ratio** or a different statistical metric.
Question 107: A group of researchers is looking to study the effect of body weight on blood pressure in the elderly. Previous work measuring body weight and blood pressure at 2-time points in a large group of healthy individuals revealed that a 10% increase in body weight was accompanied by a 7 mm Hg increase in blood pressure. If the researchers want to determine if there is a linear relationship between body weight and blood pressure in a subgroup of elderly individuals in this study, which of the following statistical methods would best be employed to answer this question?
A. Spearman’s correlation
B. Pearson’s correlation (Correct Answer)
C. One-way analysis of variance (ANOVA)
D. Two-way analysis of variance (ANOVA)
E. Wilcoxon signed-rank test
Explanation: ***Pearson’s correlation***
- **Pearson's correlation coefficient** measures the **strength and direction of a linear relationship between two continuous variables**. In this case, both body weight and blood pressure are continuous variables, and the researchers are looking for a *linear relationship*.
- The prior work also suggests a linear relationship ("a 10% increase in body weight was accompanied by a 7 mm Hg increase in blood pressure"), making Pearson's correlation the most appropriate choice to investigate this in a subgroup.
*Spearman’s correlation*
- **Spearman's correlation** measures the **strength and direction of a monotonic relationship (not necessarily linear) between two ranked variables or continuous variables that do not meet the assumptions for Pearson's correlation (e.g., non-normal distribution, outliers).**
- Since the question specifies a "linear relationship" and does not suggest violations of Pearson's assumptions, it is less appropriate than Pearson's.
*One-way analysis of variance (ANOVA)*
- **One-way ANOVA** is used to compare the **means of three or more independent groups** on a single continuous dependent variable.
- This method is not suitable because the researchers are investigating the relationship between two continuous variables (body weight and blood pressure), not comparing means across different discrete groups.
*Two-way analysis of variance (ANOVA)*
- **Two-way ANOVA** is used to examine the **effect of two categorical independent variables on a continuous dependent variable** and to assess any interaction between the two independent variables.
- Similar to one-way ANOVA, this test is inappropriate for determining the linear relationship between two continuous variables.
*Wilcoxon signed-rank test*
- The **Wilcoxon signed-rank test** is a **non-parametric test** used to compare two dependent (paired) samples, or to compare a single sample to a hypothesized median. It assesses whether two related samples differ in their ranks.
- This test is not suitable for investigating the linear relationship between two continuous variables in a single group of individuals.
Question 108: A case-control study with a focus on risk factors that may influence the development of depression was conducted among the elderly population in one tertiary hospital in Malaysia. The study involved 150 elderly patients diagnosed with depressive illness from the psychiatry ward, as well as another group of 150 elderly patients without any history of depressive illness (but hospitalized for other reasons) at the same ward. The data were collected through questionnaires, and 2 principal investigators (who were also the patients’ attending physicians) acted as interviewers after proper training for the purposes of this study. Multivariate analyses of logistic regression with independent variables were employed to determine the adjusted odds ratio for the risk of developing depression. The study results showed that a lower level of social support, lack of education, and the presence of chronic illnesses highly correlated with depression. In order to maximally avoid bias that may stem from this kind of study design, what should the researchers have done differently to increase the validity of their results?
A. Used open-ended questions
B. Blinded the investigators (Correct Answer)
C. Included more interviewers
D. Used closed testing procedures on the data
E. Used Bonferroni correction on data
Explanation: ***Blinded the investigators***
- Blinding the investigators (interviewers) would prevent them from knowing which patients were cases (depressed) and which were controls (non-depressed). This reduces the risk of **interviewer bias**, where their preconceptions or knowledge of participants' status might influence how they ask questions or interpret responses, thereby distorting the results.
- Given that the principal investigators were also the patients' attending physicians, they likely had prior knowledge of the patients' depressive status, which could lead to **detection bias** or information bias. Blinding would help standardize data collection.
*Used open-ended questions*
- While open-ended questions can provide rich qualitative data, they can introduce **variability and subjectivity** in responses and interpretation, potentially making comparisons more challenging and increasing the investigator's influence on data collection.
- For a case-control study focused on quantifiable risk factors, **structured questionnaires** are often preferred for consistency and easier statistical analysis, although a mix can be optimal.
*Included more interviewers*
- Simply including more interviewers does not inherently improve validity; it could even increase **inter-rater variability** if they are not adequately trained and standardized.
- The critical aspect is the **standardization of data collection** and the avoidance of bias, not merely the number of individuals collecting data.
*Used closed testing procedures on the data*
- "Closed testing procedures on the data" is not a standard term in research methodology in this context. Assuming it refers to using a **pre-defined set of statistical tests**, this does not directly address potential biases in data collection or patient selection.
- The issue here is related to **information bias** and **selection bias** stemming from the study design and interviewer role, not primarily the statistical analysis procedures.
*Used Bonferroni correction on data*
- **Bonferroni correction** is used to adjust the p-values when performing multiple statistical comparisons on the same data set to reduce the chance of making a **Type I error** (false positive).
- This correction addresses issues in **statistical analysis** (minimizing spurious findings due to multiple testing), not biases that arise during the design, data collection, or participant identification phases of a study.
Question 109: A recent study examined trends in incidence and fatality of ischemic stroke in a representative sample of Scandinavian towns. The annual incidence of ischemic stroke was calculated to be 60 per 2,000 people. The 1-year case fatality rate for ischemic stroke was found to be 20%. The health department of a town in southern Sweden with a population of 20,000 is interested in knowing the 1-year mortality conferred by ischemic stroke. Based on the study's findings, which of the following estimates the annual mortality rate for ischemic stroke per 20,000?
A. 600 people
B. 400 people
C. 120 people (Correct Answer)
D. 60 people
E. 12 people
Explanation: ***120 people***
- The annual incidence of ischemic stroke is 60 per 2,000 people. For a population of 20,000, the annual number of new stroke cases would be (60/2,000) * 20,000 = **600 cases**.
- With a 1-year case fatality rate of 20%, the annual mortality from ischemic stroke is 20% of these 600 cases, which is 0.20 * 600 = **120 people**.
*600 people*
- This number represents the estimated **annual incidence of ischemic stroke** in a town of 20,000 people, not the mortality rate.
- It is calculated as (60/2,000) * 20,000 = 600, before applying the case fatality rate.
*400 people*
- This number is not directly derived from the provided incidence and fatality rates for a population of 20,000.
- It might represent a miscalculation of either incidence or mortality.
*60 people*
- This is the **incidence of ischemic stroke** per 2,000 people, not the mortality rate for a larger population of 20,000.
- It does not account for the total population size or the case fatality rate.
*12 people*
- This would be the mortality if the incidence was extremely low or the case fatality rate was significantly lower than 20% for a population of 20,000.
- It is a significant underestimate based on the given data.
Question 110: A newlywed couple comes to your office for genetic counseling. Both potential parents are known to be carriers of the same Cystic Fibrosis (CF) mutation. What is the probability that at least one of their next three children will have CF if they are all single births?
A. 37/64 (Correct Answer)
B. 0
C. 1/64
D. 1
E. 27/64
Explanation: ***37/64***
- The probability of a child having CF from two carrier parents is **1/4** (recessive inheritance), and the probability of a child not having CF is **3/4**.
- The probability that *none* of the three children will have CF is (3/4)³ = **27/64**. Therefore, the probability that *at least one* child will have CF is 1 - 27/64 = **37/64**.
*0*
- This option is incorrect because there is a **definite statistical probability** for a child to inherit CF when both parents are carriers.
- CF is an **autosomal recessive disorder**, meaning there is a 25% chance per child, not a 0% chance.
*1/64*
- This represents the probability that ***all three children*** would have CF: (1/4)³ = 1/64.
- This is an **underestimation** of the probability for at least one child to be affected, as the question asks about "at least one" not "all three."
*1*
- This would imply that it's an **absolute certainty** that at least one child will have CF, which is incorrect.
- Each child's outcome is independent, and there is always a chance (27/64) that none of the three children will have the disease.
*27/64*
- This calculation represents the probability that **none of the three children will have CF**: (3/4)³ = 27/64.
- This is the **complementary probability** to "at least one child having CF", not the actual answer to the question asked.