Study Design Practice Questions

Q: A study is performed to determine the prevalence of a particular rare fungal pneumonia. A sample population of 100 subjects is monitored for 4 months. Every month, the entire population is screened and the number of new cases is recorded for the group. The data from the study are given in the table below: Time point New cases of fungal pneumonia t = 0 months 10 t = 1 months 4 t = 2 months 2 t = 3 months 5 t = 4 months 4 Which of the following is correct regarding the prevalence of this rare fungal pneumonia in this sample population?

The prevalence at the conclusion of the study is 25%.. ***The prevalence at the conclusion of the study is 25%*** - Prevalence is calculated by dividing the **total number of existing cases** by the total population at a specific point in time. At the conclusion of the study (t=4 months), the cumulative number of new cases is 10 + 4 + 2 + 5 + 4 = 25. - The prevalence is therefore 25 cases / 100 subjects = **25%**. *The prevalence at time point 2 months is 2%* - At time point 2 months, the **cumulative number of new cases** is 10 (at t=0) + 4 (at t=1) + 2 (at t=2) = 16 cases. - The prevalence at 2 months would be 16 cases / 100 subjects = **16%**, not 2%. *The prevalence at time point 3 months is 11%* - The cumulative number of new cases at time point 3 months is 10 (at t=0) + 4 (at t=1) + 2 (at t=2) + 5 (at t=3) = 21 cases. - The prevalence at 3 months would be 21 cases / 100 subjects = **21%**, not 11%. *The prevalence at the conclusion of the study is 15%* - The cumulative number of new cases at the conclusion of the study (t=4 months) is 10 + 4 + 2 + 5 + 4 = **25 cases**. - Therefore, the prevalence is 25 cases / 100 subjects = **25%**, not 15%. *The prevalence and the incidence at time point 2 months are equal* - **Incidence** refers to the number of *new* cases within a specified period, which at t=2 months is 2 cases. - **Prevalence** at t=2 months is the cumulative number of cases (10+4+2 = 16 cases), so incidence (2%) and prevalence (16%) are **not equal**.

Q: A 25-year-old man with a genetic disorder presents for genetic counseling because he is concerned about the risk that any children he has will have the same disease as himself. Specifically, since childhood he has had difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy. He has also had diarrhea and malabsorption requiring enzyme replacement therapy. If his wife comes from a population where 1 in 10,000 people are affected by this same disorder, which of the following best represents the likelihood a child would be affected as well?

1%. ***Correct Option: 1%*** - The patient's symptoms (difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy; diarrhea and malabsorption requiring enzyme replacement therapy) are classic for **cystic fibrosis (CF)**, an **autosomal recessive disorder**. - For an autosomal recessive disorder with a prevalence of 1 in 10,000 in the general population, **q² = 1/10,000**, so **q = 1/100 = 0.01**. The carrier frequency **(2pq)** is approximately **2q = 2 × (1/100) = 1/50 = 0.02**. - The affected man is **homozygous recessive (aa)** and will always pass on the recessive allele. His wife has a **1/50 chance of being a carrier (Aa)**. If she is a carrier, she has a **1/2 chance of passing on the recessive allele**. - Therefore, the probability of an affected child = **(Probability wife is a carrier) × (Probability wife passes recessive allele) = 1/50 × 1/2 = 1/100 = 1%**. *Incorrect Option: 0.01%* - This percentage is too low and does not correctly account for the carrier frequency in the population and the probability of transmission from a carrier mother. *Incorrect Option: 2%* - This represents approximately the carrier frequency (1/50 ≈ 2%), but does not account for the additional 1/2 probability that a carrier mother would pass on the recessive allele. *Incorrect Option: 0.5%* - This value would be correct if the carrier frequency were 1/100 instead of 1/50, which does not match the given population prevalence. *Incorrect Option: 50%* - **50%** would be the risk if both parents were carriers of an autosomal recessive disorder (1/4 chance = 25% for affected, but if we know one parent passes the allele, conditional probability changes). More accurately, 50% would apply if the disorder were **autosomal dominant** with one affected parent, which is not the case here.

Q: A study on cholesterol levels is performed. There are 1000 participants. It is determined that in this population, the mean LDL is 200 mg/dL with a standard deviation of 50 mg/dL. If the population has a normal distribution, how many people have a cholesterol less than 300 mg/dL?

975. ***975*** - This value corresponds to **two standard deviations** above the mean in a normal distribution, as per the **empirical rule (68-95-99.7 rule)**. - With a mean of 200 mg/dL and a standard deviation of 50 mg/dL, 300 mg/dL is (300-200)/50 = 2 standard deviations above the mean. Approximately 97.5% of data falls below +2 standard deviations in a normal distribution. Therefore, 0.975 × 1000 = 975 people. *950* - This number would correspond to 95% of the population, which is the percentage within **±1.96 standard deviations** from the mean in a two-tailed distribution (used for 95% confidence intervals). - However, the question asks for values *less than* 300 mg/dL (a one-tailed scenario at exactly +2 SD), which is 97.5%, not 95%. *680* - This represents the percentage of data (68%) that falls within **one standard deviation (±1 SD)** of the mean in a normal distribution. - In this scenario, one standard deviation above the mean is 250 mg/dL, not 300 mg/dL. This option incorrectly applies the 68% rule. *997* - This corresponds to the percentage of data (99.7%) that falls within **three standard deviations (±3 SD)** of the mean in a normal distribution. - Three standard deviations *above* the mean would be 350 mg/dL (200 + 3×50), which is beyond the target value of 300 mg/dL. The question asks about 300 mg/dL, which is only 2 SD above the mean. *840* - This number represents the percentage of data that falls below **one standard deviation (1 SD)** above the mean in a normal distribution. - Using the empirical rule: 50% (below mean) + 34% (between mean and +1 SD) = 84%. Thus, 0.84 × 1000 = 840 people. However, 300 mg/dL is two standard deviations above the mean (250 mg/dL = +1 SD), not one.

Q: A surgeon is interested in studying how different surgical techniques impact the healing of tendon injuries. In particular, he will compare 3 different types of suture repairs biomechanically in order to determine the maximum load before failure of the tendon 2 weeks after repair. He collects data on maximum load for 90 different repaired tendons from an animal model. Thirty tendons were repaired using each of the different suture techniques. Which of the following statistical measures is most appropriate for analyzing the results of this study?

ANOVA. ***ANOVA*** - **ANOVA (Analysis of Variance)** is appropriate here because it compares the means of **three or more independent groups** (the three different suture techniques) on a continuous dependent variable (maximum load before failure). - The study has three distinct repair techniques, each with 30 tendons, making ANOVA suitable for determining if there are statistically significant differences among their mean failure loads. *Chi-squared* - The **Chi-squared test** is used for analyzing **categorical data** (frequencies or proportions) to determine if there is an association between two nominal variables. - This study involves quantitative measurement (maximum load), not categorical data, making Chi-squared inappropriate. *Wilcoxon rank sum* - The **Wilcoxon rank sum test** (also known as Mann-Whitney U test) is a **non-parametric test** used to compare two independent groups when the data is not normally distributed or is ordinal. - While the study has independent groups, it involves three groups, and the dependent variable is continuous, making ANOVA a more powerful and appropriate choice assuming normal distribution. *Pearson r coefficient* - The **Pearson r coefficient** measures the **strength and direction of a linear relationship between two continuous variables**. - This study aims to compare means across different groups, not to determine the correlation between two continuous variables. *Student t-test* - The **Student t-test** is used to compare the means of **exactly two groups** (either independent or paired) on a continuous dependent variable. - This study involves comparing three different suture techniques, not just two, making the t-test unsuitable.

Q: A group of researchers studying the relationship between major depressive disorder and unprovoked seizures identified 36 patients via chart review who had been rehospitalized for unprovoked seizures following discharge from an inpatient psychiatric unit and 105 patients recently discharged from the same unit who did not experience unprovoked seizures. The results of the study show: Unprovoked seizure No seizure Major depressive disorder 20 35 No major depressive disorder 16 70 Based on this information, which of the following is the most appropriate measure of association between history of major depressive disorder (MDD) and unprovoked seizures?

2.5. ***2.5*** - This is a **case-control study** because it starts with individuals who have the outcome (unprovoked seizures) and individuals who do not, then looks back at their exposure (major depressive disorder). - For a case-control study, the appropriate measure of association is the **odds ratio (OR)**, calculated as (a/c) / (b/d) = (ad) / (bc). In this case: a = 20 (MDD with seizure), b = 35 (MDD without seizure), c = 16 (no MDD with seizure), d = 70 (no MDD without seizure). So, OR = (20 * 70) / (35 * 16) = 1400 / 560 = 2.5. *1.95* - This value might be a calculation error or represent a different measure of association not applicable to this study design. - The correct calculation for the odds ratio leads to 2.5. *0.19* - This value is likely a **relative risk** or **risk ratio**, which is used in cohort studies where risk is directly measured. - In a case-control study, the **incidence of the outcome** cannot be directly determined, making relative risk an inappropriate measure. *0.36* - This value is not derived from the appropriate statistical calculation for the odds ratio in a case-control study. - It might represent a **proportion** or a different type of risk calculation. *0.17* - This value is not the correct measure of association for a case-control study. - It could be a miscalculation of a **prevalence ratio** or a different statistical metric.

Q: A recent study examined trends in incidence and fatality of ischemic stroke in a representative sample of Scandinavian towns. The annual incidence of ischemic stroke was calculated to be 60 per 2,000 people. The 1-year case fatality rate for ischemic stroke was found to be 20%. The health department of a town in southern Sweden with a population of 20,000 is interested in knowing the 1-year mortality conferred by ischemic stroke. Based on the study's findings, which of the following estimates the annual mortality rate for ischemic stroke per 20,000?

120 people. ***120 people*** - The annual incidence of ischemic stroke is 60 per 2,000 people. For a population of 20,000, the annual number of new stroke cases would be (60/2,000) * 20,000 = **600 cases**. - With a 1-year case fatality rate of 20%, the annual mortality from ischemic stroke is 20% of these 600 cases, which is 0.20 * 600 = **120 people**. *600 people* - This number represents the estimated **annual incidence of ischemic stroke** in a town of 20,000 people, not the mortality rate. - It is calculated as (60/2,000) * 20,000 = 600, before applying the case fatality rate. *400 people* - This number is not directly derived from the provided incidence and fatality rates for a population of 20,000. - It might represent a miscalculation of either incidence or mortality. *60 people* - This is the **incidence of ischemic stroke** per 2,000 people, not the mortality rate for a larger population of 20,000. - It does not account for the total population size or the case fatality rate. *12 people* - This would be the mortality if the incidence was extremely low or the case fatality rate was significantly lower than 20% for a population of 20,000. - It is a significant underestimate based on the given data.

Q: A newlywed couple comes to your office for genetic counseling. Both potential parents are known to be carriers of the same Cystic Fibrosis (CF) mutation. What is the probability that at least one of their next three children will have CF if they are all single births?

37/64. ***37/64*** - The probability of a child having CF from two carrier parents is **1/4** (recessive inheritance), and the probability of a child not having CF is **3/4**. - The probability that *none* of the three children will have CF is (3/4)³ = **27/64**. Therefore, the probability that *at least one* child will have CF is 1 - 27/64 = **37/64**. *0* - This option is incorrect because there is a **definite statistical probability** for a child to inherit CF when both parents are carriers. - CF is an **autosomal recessive disorder**, meaning there is a 25% chance per child, not a 0% chance. *1/64* - This represents the probability that ***all three children*** would have CF: (1/4)³ = 1/64. - This is an **underestimation** of the probability for at least one child to be affected, as the question asks about "at least one" not "all three." *1* - This would imply that it's an **absolute certainty** that at least one child will have CF, which is incorrect. - Each child's outcome is independent, and there is always a chance (27/64) that none of the three children will have the disease. *27/64* - This calculation represents the probability that **none of the three children will have CF**: (3/4)³ = 27/64. - This is the **complementary probability** to "at least one child having CF", not the actual answer to the question asked.

Question 1

A study is performed to determine the prevalence of a particular rare fungal pneumonia. A sample population of 100 subjects is monitored for 4 months. Every month, the entire population is screened and the number of new cases is recorded for the group. The data from the study are given in the table below:
Time point New cases of fungal pneumonia
t = 0 months 10
t = 1 months 4
t = 2 months 2
t = 3 months 5
t = 4 months 4
Which of the following is correct regarding the prevalence of this rare fungal pneumonia in this sample population?

Accepted Answer

The prevalence at the conclusion of the study is 25%.

Answer

The prevalence at time point 2 months is 2%.

Answer

The prevalence at time point 3 months is 11%.

Answer

The prevalence at the conclusion of the study is 15%.

Answer

The prevalence and the incidence at time point 2 months are equal.

Question 2

A 25-year-old man with a genetic disorder presents for genetic counseling because he is concerned about the risk that any children he has will have the same disease as himself. Specifically, since childhood he has had difficulty breathing requiring bronchodilators, inhaled corticosteroids, and chest physiotherapy. He has also had diarrhea and malabsorption requiring enzyme replacement therapy. If his wife comes from a population where 1 in 10,000 people are affected by this same disorder, which of the following best represents the likelihood a child would be affected as well?

Accepted Answer

1%

Answer

0.01%

Answer

2%

Answer

0.5%

Answer

50%

Question 3

A study on cholesterol levels is performed. There are 1000 participants. It is determined that in this population, the mean LDL is 200 mg/dL with a standard deviation of 50 mg/dL. If the population has a normal distribution, how many people have a cholesterol less than 300 mg/dL?

Accepted Answer

975

Answer

950

Answer

680

Answer

997

Answer

840

Question 4

You are reviewing the protocol for a retrospective case-control study investigating risk factors for mesothelioma among retired factory workers. 100 cases of mesothelioma and 100 age and sex matched controls are to be recruited and interviewed about their exposure to industrial grade fiberglass by blinded interviewers. The investigators' primary hypothesis is that cases of mesothelioma will be more likely to have been exposed to industrial grade fiberglass. The design of this study is most concerning for which type of bias?

Accepted Answer

Recall bias

Answer

This study design is free of potential bias

Answer

Observer bias

Answer

Interviewer bias

Answer

Lead-time bias

Question 5

A surgeon is interested in studying how different surgical techniques impact the healing of tendon injuries. In particular, he will compare 3 different types of suture repairs biomechanically in order to determine the maximum load before failure of the tendon 2 weeks after repair. He collects data on maximum load for 90 different repaired tendons from an animal model. Thirty tendons were repaired using each of the different suture techniques. Which of the following statistical measures is most appropriate for analyzing the results of this study?

Accepted Answer

ANOVA

Answer

Chi-squared

Answer

Wilcoxon rank sum

Answer

Pearson r coefficient

Answer

Student t-test

Question 6

A group of researchers studying the relationship between major depressive disorder and unprovoked seizures identified 36 patients via chart review who had been rehospitalized for unprovoked seizures following discharge from an inpatient psychiatric unit and 105 patients recently discharged from the same unit who did not experience unprovoked seizures. The results of the study show:
Unprovoked seizure No seizure
Major depressive disorder 20 35
No major depressive disorder 16 70
Based on this information, which of the following is the most appropriate measure of association between history of major depressive disorder (MDD) and unprovoked seizures?

Accepted Answer

2.5

Answer

1.95

Answer

0.19

Answer

0.36

Answer

0.17

Question 7

A group of researchers is looking to study the effect of body weight on blood pressure in the elderly. Previous work measuring body weight and blood pressure at 2-time points in a large group of healthy individuals revealed that a 10% increase in body weight was accompanied by a 7 mm Hg increase in blood pressure. If the researchers want to determine if there is a linear relationship between body weight and blood pressure in a subgroup of elderly individuals in this study, which of the following statistical methods would best be employed to answer this question?

Accepted Answer

Pearson’s correlation

Answer

Spearman’s correlation

Answer

One-way analysis of variance (ANOVA)

Answer

Two-way analysis of variance (ANOVA)

Answer

Wilcoxon signed-rank test

Question 8

A case-control study with a focus on risk factors that may influence the development of depression was conducted among the elderly population in one tertiary hospital in Malaysia. The study involved 150 elderly patients diagnosed with depressive illness from the psychiatry ward, as well as another group of 150 elderly patients without any history of depressive illness (but hospitalized for other reasons) at the same ward. The data were collected through questionnaires, and 2 principal investigators (who were also the patients’ attending physicians) acted as interviewers after proper training for the purposes of this study. Multivariate analyses of logistic regression with independent variables were employed to determine the adjusted odds ratio for the risk of developing depression. The study results showed that a lower level of social support, lack of education, and the presence of chronic illnesses highly correlated with depression. In order to maximally avoid bias that may stem from this kind of study design, what should the researchers have done differently to increase the validity of their results?

Accepted Answer

Blinded the investigators

Answer

Used open-ended questions

Answer

Included more interviewers

Answer

Used closed testing procedures on the data

Answer

Used Bonferroni correction on data

Question 9

A recent study examined trends in incidence and fatality of ischemic stroke in a representative sample of Scandinavian towns. The annual incidence of ischemic stroke was calculated to be 60 per 2,000 people. The 1-year case fatality rate for ischemic stroke was found to be 20%. The health department of a town in southern Sweden with a population of 20,000 is interested in knowing the 1-year mortality conferred by ischemic stroke. Based on the study's findings, which of the following estimates the annual mortality rate for ischemic stroke per 20,000?

Accepted Answer

120 people

Answer

600 people

Answer

400 people

Answer

60 people

Answer

12 people

Question 10

A newlywed couple comes to your office for genetic counseling. Both potential parents are known to be carriers of the same Cystic Fibrosis (CF) mutation. What is the probability that at least one of their next three children will have CF if they are all single births?

Accepted Answer

37/64

Answer

0

Answer

1/64

Answer

1

Answer

27/64

Study Design — MCQs

Study Design — MCQs

On this page

Practice by Chapter

Want unlimited practice?