Study Design Practice Questions

Q: A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?

Stratified random sampling. ***Stratified random sampling*** - This method involves dividing the population into **distinct subgroups (strata)** based on shared characteristics (in this case, humans, animals, and plants), and then performing a simple random sample within each stratum. - This ensures that all subgroups are proportionally represented in the sample, which is appropriate when studying effects across different biological categories. *Simple random sampling* - This method involves selecting individuals from the entire population **purely by chance**, without first dividing them into subgroups. - It would not guarantee representation from all three distinct groups (humans, animals, and plants), which is essential for studying differential effects. *Systematic sampling* - This involves selecting samples at **regular intervals** from an ordered list or sequence. - This method is not suitable here because the population is divided into distinct, non-ordered groups rather than a continuous sequence. *Cluster sampling* - This method involves dividing the population into **clusters**, then randomly selecting some clusters and sampling all individuals within those selected clusters. - In this scenario, the initial groups (humans, animals, plants) are strata, not clusters, as the intent is to sample from within each group, not to treat the groups themselves as primary sampling units. *Convenience sampling* - This is a **non-probability sampling method** where subjects are selected based on ease of access rather than random selection. - The question explicitly states that random selection is performed from each group, ruling out convenience sampling.

Q: A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?

Chi-square test. ***Chi-square test*** - The **chi-square test** is appropriate when analyzing the relationship between two **categorical variables**. In this scenario, "diet type" (vegetarian/non-vegetarian) and "presence of disease" (yes/no) are both categorical variables. - This test determines if there is a statistically significant association between the frequency counts of these two variables in a contingency table. *T-test* - A **t-test** is used to compare the **means** of two groups, typically when the dependent variable is continuous. - This test is unsuitable here because the presence of disease and diet type are categorical, not continuous, variables. *ANOVA* - **ANOVA** (Analysis of Variance) is used to compare the **means** of three or more groups, often with a continuous dependent variable. - Similar to the t-test, ANOVA is not applicable as the study involves categorical variables, not the comparison of means across multiple groups. *Fisher's exact test* - **Fisher's exact test** is similar to the chi-square test but specifically used for **small sample sizes** where the expected frequencies in any cell of the contingency table are less than 5. - While it analyzes categorical data, the chi-square test is the more general and commonly preferred test for larger sample sizes, which is generally assumed unless otherwise specified. *Mann-Whitney U test* - The **Mann-Whitney U test** is a non-parametric test used to compare differences between two independent groups when the dependent variable is **ordinal or continuous** but not normally distributed. - This test is not appropriate for analyzing the association between two categorical variables, as it requires at least one variable to have ranked or continuous data.

Q: A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

Paired t-test. ***Paired t-test*** - A **paired t-test** is appropriate for comparing means from two related samples, such as "before" and "after" measurements on the **same individuals**. - It assesses whether there is a statistically significant difference between these **dependent observations**. *Independent t-test* - The independent t-test compares means between **two separate groups** (unrelated samples). - It is inappropriate here because we have **paired data** from the same individuals measured twice, not two independent groups. *McNemar test* - The McNemar test is used for comparing **paired nominal data**, typically in a 2×2 table, for example, before-after changes in a proportion or categorical outcome. - It is not suitable for **continuous data** like cholesterol levels. *Chi-square test* - The chi-square test is used to assess the association between **two categorical variables** or to compare observed frequencies with expected frequencies. - It is not designed for comparing means of **continuous variables** in paired samples. *Wilcoxon signed-rank test* - The Wilcoxon signed-rank test is a **non-parametric alternative to the paired t-test**, used when the data are not normally distributed or when the sample size is small. - While it's used for paired data, the paired t-test is generally preferred when parametric assumptions (like **normality**) can be met, especially with a sample size of 80.

Q: A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?

4.5. ***4.5*** - The given survival times are already ordered: 2, 3, 4, 4, 5, 6, 7, 8. - Since there is an **even number of observations (n=8)**, the median is the average of the two middle values, which are the 4th and 5th values. (4 + 5) / 2 = **4.5**. *3.5* - This value would result from incorrectly averaging the 3rd and 4th observations (3 + 4) / 2 = 3.5. - This error occurs when miscounting the middle positions in an even-numbered dataset. *4.0* - This value represents the **fourth observation** in the ordered list, not the true median for an even number of data points. - While it is one of the middle values, the median for an even dataset requires averaging the two middle-most values. *5.0* - This value represents the **fifth observation** in the ordered list, not the true median for an even number of data points. - It would be the median if the dataset contained an odd number of observations and 5 was the middle term. *5.5* - This value would be the mean of 5 and 6, which are the 5th and 6th values, not the correct middle values. - This calculation does not represent the correct methodology for finding the median in this dataset.

Q: An investigator has conducted a prospective study to evaluate the relationship between asthma and the risk of myocardial infarction (MI). She stratifies her analyses by biological sex and observed that among female patients, asthma was a significant predictor of MI risk (hazard ratio = 1.32, p < 0.001). However, among male patients, no relationship was found between asthma and MI risk (p = 0.23). Which of the following best explains the difference observed between male and female patients?

Effect modification. ***Effect modification*** - **Effect modification** occurs when the relationship between an exposure (asthma) and an outcome (MI) differs across various levels of a third variable (biological sex). - In this scenario, sex alters the effect of asthma on MI risk, showing a significant relationship in females but not in males, which is the definition of effect modification. *Measurement bias* - **Measurement bias** refers to systematic errors in the collection of data, leading to inaccurate assessment of exposure, outcome, or confounders. - There is no indication in the question that the methods of measuring asthma or MI differed systematically between males and females, or that the measurements themselves were flawed. *Stratified sampling* - **Stratified sampling** is a technique used in study design where a population is divided into subgroups (strata) and then samples are randomly selected from each stratum. - While the analysis was stratified by sex, this choice was made during data analysis to understand differences, not necessarily during the initial sampling process to ensure representation. *Confounding* - **Confounding** occurs when a third variable is associated with both the exposure and the outcome, and it distorts the true relationship between them. - The investigator stratified by sex and found different results, implying that sex is not merely a confounder that needs to be controlled, but rather a variable that modifies the effect. *Random error* - **Random error** is unsystematic variation in data that can lead to imprecise measurements or findings due to chance. - While random error can contribute to non-significant findings, the significant p-value (<0.001) in females and the clear difference in effect between sexes suggest a systematic phenomenon rather than mere random chance.

Q: A doctor is interested in developing a new over-the-counter medication that can decrease the symptomatic interval of upper respiratory infections from viral etiologies. The doctor wants one group of affected patients to receive the new treatment, but he wants another group of affected patients to not be given the treatment. Of the following clinical trial subtypes, which would be most appropriate in comparing the differences in outcome between the two groups?

Randomized controlled trial. ***Randomized controlled trial*** - This design is ideal for evaluating the **efficacy of an intervention** (new medication) by randomly assigning participants to either a treatment group or a control group. - **Randomization minimizes bias** and ensures that any observed differences in outcomes between the groups can be attributed to the intervention. *Case-control study* - This study design is retrospective and compares individuals with a **disease (cases)** to individuals without the disease (controls) to identify **risk factors** or exposures. - It would not be suitable for testing the effectiveness of a new treatment as it starts with outcomes and looks backward at exposures, not forward at intervention effects. *Cohort study* - A cohort study observes a group of individuals (a cohort) over time to see who develops a disease or outcome, often starting with individuals exposed and unexposed to a **risk factor**. - While it tracks outcomes, it usually doesn't involve an active intervention or random assignment, making it less suitable for directly comparing a new treatment's efficacy against a control. *Historical cohort study* - This is a type of cohort study that uses **past data or records** to identify the cohort and their exposures, then follows them forward in time using existing data to determine outcomes. - It would not be appropriate for testing a *new* medication because it relies on historical exposures and outcomes, not a prospective, controlled intervention. *Cross-sectional study* - This study measures the **prevalence of a disease or condition** and related factors at a single point in time, essentially taking a "snapshot." - It cannot establish causality or evaluate the effectiveness of an intervention over time due to its lack of follow-up and inability to determine the temporal sequence of events.

Q: A 23-year-old woman presents to her primary care physician because she has been having difficulty seeing despite previously having perfect vision all her life. Specifically, she notes that reading, driving, and recognizing faces has become difficult, and she feels that her vision has become fuzzy. She is worried because both of her older brothers have had visual loss with a similar presentation. Visual exam reveals bilateral loss of central vision with decreased visual acuity and color perception. Pathological examination of this patient's retinas reveals degeneration of retinal ganglion cells bilaterally. She is then referred to a geneticist because she wants to know the probability that her son and daughter will also be affected by this disorder. Her husband's family has no history of this disease. Ignoring the effects of incomplete penetrance, which of the following are the chances that this patient's children will be affected by this disease?

Daughter: 100% and son 100%. ***Daughter: 100% and son: 100%*** - This scenario describes **Leber Hereditary Optic Neuropathy (LHON)**, characterized by **bilateral central vision loss** and **degeneration of retinal ganglion cells**, with a maternal inheritance pattern. - LHON is caused by a **mitochondrial DNA mutation**, meaning the disease is transmitted exclusively from the mother to **all her children, regardless of sex**. - Since mitochondrial DNA is inherited entirely from the maternal lineage, **100% of offspring will inherit the mutation**. - The question specifies "ignoring incomplete penetrance," meaning we focus on mutation inheritance rather than symptom development. *Daughter: 50% and son: 50%* - This inheritance pattern is characteristic of an **autosomal dominant** trait, where there is a 50% chance of passing the allele to each child. - This does not fit the described pattern of maternal inheritance where all children inherit the mutation from an affected mother. *Daughter: ~0% and son: ~0%* - This would only be true if neither parent was a carrier or affected, or if the disease had a very complex, non-mendelian inheritance with low penetrance. - Given the mother's affected status and the mitochondrial inheritance pattern, the children will definitely inherit the mutation. *Daughter: 25% and son: 25%* - This ratio is typical for an **autosomal recessive** inheritance pattern where both parents are heterozygotes (carriers). - This does not align with the exclusively maternal transmission observed in LHON. *Daughter: ~0% and son: 50%* - This inheritance pattern is typical for an **X-linked recessive** disorder, where daughters of an affected father are unaffected carriers and sons have a 50% chance of being affected if the mother is a carrier. - This is incorrect because LHON is mitochondrially inherited from the mother to all children, not X-linked.

Q: A group of researchers recently conducted a meta-analysis of twenty clinical trials encompassing 10,000 women with estrogen receptor-positive breast cancer who were disease-free following adjuvant radiotherapy. After an observation period of 15 years, the relationship between tumor grade and distant recurrence of cancer was evaluated. The results show: Distant recurrence No distant recurrence Well differentiated 500 4500 Moderately differentiated 375 2125 Poorly differentiated 550 1950 Based on this information, which of the following is the 15-year risk for distant recurrence in patients with high-grade breast cancer?

550/2500. ***550/2500*** - The question asks for the 15-year risk for distant recurrence in patients with **high-grade breast cancer**, which corresponds to **poorly differentiated** tumors in the provided data. - For poorly differentiated tumors, there were 550 cases of distant recurrence out of a total of 550 + 1950 = **2500 patients** (550 with recurrence + 1950 without recurrence). Therefore, the risk is 550/2500. *500/5000* - This calculation represents the risk for distant recurrence in **well-differentiated** tumors (500 recurrences out of 500 + 4500 = 5000 total well-differentiated cases), not high-grade (poorly differentiated) tumors. *1950/8575* - This calculation incorrectly uses 1950 (number of poorly differentiated patients *without* recurrence) as the numerator. The denominator also appears to be incorrectly calculated or irrelevant to the specific group in question. *2500/10000* - This calculation represents the **total number of poorly differentiated patients** (2500) divided by the total number of patients in the study (10000), which is the proportion of patients with poorly differentiated cancer, not the risk of recurrence within that group. *550/1425* - This calculation incorrectly uses 1425 as the denominator. The total number of patients with poorly differentiated tumors is 2500 (550 with recurrence + 1950 without recurrence), not 1425.

Q: A clinical trial is conducted to determine the role of cerebrospinal fluid (CSF) beta-amyloid levels as a biomarker in the early detection and prognosis of Alzheimer disease. A total of 100 participants are enrolled and separated into three groups according to their Mini-Mental State Examination (MMSE) score: mild dementia (20–24 points), moderate dementia (13–20 points), and severe dementia (< 13 points). Participants' CSF level of beta-amyloid 42 is measured using an immunoassay. It is found that participants with severe dementia have a statistically significantly lower mean CSF level of beta-amyloid 42 compared to the other two groups. Which of the following statistical tests was most likely used to compare measurements between the study groups?

Analysis of variance. ***Analysis of variance (ANOVA)*** - This statistical test is used to compare the means of **three or more independent groups**. In this scenario, it would be appropriate for comparing the mean CSF beta-amyloid levels across the mild, moderate, and severe dementia groups. - ANOVA determines if there is a statistically significant difference between the means of these groups, and if so, post-hoc tests can identify which specific groups differ. *Chi-square test* - The chi-square test is used for **categorical data** to determine if there is a significant association between two variables. - This scenario involves comparing **continuous numerical data** (CSF beta-amyloid levels) across groups, not categorical frequencies. *Pearson correlation analysis* - Pearson correlation measures the **linear relationship** and strength of association between **two continuous numerical variables**. - Here, the goal is to compare means across multiple groups, not to assess the correlation between two continuous variables. *Fisher's exact test* - Fisher's exact test is used for analyzing the association between two **categorical variables** in a **2x2 contingency table**, especially with small sample sizes. - This test is not suitable for comparing the means of a continuous variable across multiple groups. *Two-sample t-test* - A two-sample t-test is used to compare the means of **exactly two independent groups**. - Since this study involves **three distinct groups** (mild, moderate, and severe dementia), a two-sample t-test would be insufficient to analyze all group comparisons simultaneously, requiring multiple t-tests which increases the risk of Type I error.

Question 1

A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?

Accepted Answer

Stratified random sampling

Answer

Simple random sampling

Answer

Systematic sampling

Answer

Cluster sampling

Answer

Convenience sampling

Question 2

A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?

Accepted Answer

Chi-square test

Answer

T-test

Answer

ANOVA

Answer

Fisher's exact test

Answer

Mann-Whitney U test

Question 3

A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

Accepted Answer

Paired t-test

Answer

McNemar test

Answer

Chi-square test

Answer

Wilcoxon signed-rank test

Answer

Independent t-test

Question 4

A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?

Accepted Answer

4.5

Answer

4.0

Answer

5.0

Answer

5.5

Answer

3.5

Question 5

An investigator has conducted a prospective study to evaluate the relationship between asthma and the risk of myocardial infarction (MI). She stratifies her analyses by biological sex and observed that among female patients, asthma was a significant predictor of MI risk (hazard ratio = 1.32, p < 0.001). However, among male patients, no relationship was found between asthma and MI risk (p = 0.23). Which of the following best explains the difference observed between male and female patients?

Accepted Answer

Effect modification

Answer

Measurement bias

Answer

Stratified sampling

Answer

Confounding

Answer

Random error

Question 6

An investigator studying the effects of dietary salt restriction on atrial fibrillation compares two published studies, A and B. In study A, nursing home patients without atrial fibrillation were randomly assigned to a treatment group receiving a low-salt diet or a control group without dietary salt restriction. When study B began, dietary sodium intake was estimated among elderly outpatients without atrial fibrillation using 24-hour dietary recall. In both studies, patients were reevaluated at the end of one year for atrial fibrillation. Which of the following statements about the two studies is true?

Accepted Answer

Study A allows for better control of confounding variables

Answer

Study A results can be analyzed using a t-test

Answer

Study B results can be analyzed using a chi-square test

Answer

Study B allows for better control over selection bias

Answer

Study B is better at inferring causality

Question 7

A doctor is interested in developing a new over-the-counter medication that can decrease the symptomatic interval of upper respiratory infections from viral etiologies. The doctor wants one group of affected patients to receive the new treatment, but he wants another group of affected patients to not be given the treatment. Of the following clinical trial subtypes, which would be most appropriate in comparing the differences in outcome between the two groups?

Accepted Answer

Randomized controlled trial

Answer

Case-control study

Answer

Cohort study

Answer

Historical cohort study

Answer

Cross-sectional study

Question 8

A 23-year-old woman presents to her primary care physician because she has been having difficulty seeing despite previously having perfect vision all her life. Specifically, she notes that reading, driving, and recognizing faces has become difficult, and she feels that her vision has become fuzzy. She is worried because both of her older brothers have had visual loss with a similar presentation. Visual exam reveals bilateral loss of central vision with decreased visual acuity and color perception. Pathological examination of this patient's retinas reveals degeneration of retinal ganglion cells bilaterally. She is then referred to a geneticist because she wants to know the probability that her son and daughter will also be affected by this disorder. Her husband's family has no history of this disease. Ignoring the effects of incomplete penetrance, which of the following are the chances that this patient's children will be affected by this disease?

Accepted Answer

Daughter: 100% and son 100%

Answer

Daughter: 50% and son: 50%

Answer

Daughter: ~0% and son: ~0%

Answer

Daughter: 25% and son: 25%

Answer

Daughter: ~0% and son: 50%

Question 9

A group of researchers recently conducted a meta-analysis of twenty clinical trials encompassing 10,000 women with estrogen receptor-positive breast cancer who were disease-free following adjuvant radiotherapy. After an observation period of 15 years, the relationship between tumor grade and distant recurrence of cancer was evaluated. The results show:

Distant recurrence    No distant recurrence
Well differentiated               500                    4500
Moderately differentiated         375                    2125
Poorly differentiated             550                    1950

Based on this information, which of the following is the 15-year risk for distant recurrence in patients with high-grade breast cancer?

Accepted Answer

550/2500

Answer

500/5000

Answer

1950/8575

Answer

2500/10000

Answer

550/1425

Question 10

A clinical trial is conducted to determine the role of cerebrospinal fluid (CSF) beta-amyloid levels as a biomarker in the early detection and prognosis of Alzheimer disease. A total of 100 participants are enrolled and separated into three groups according to their Mini-Mental State Examination (MMSE) score: mild dementia (20–24 points), moderate dementia (13–20 points), and severe dementia (< 13 points). Participants' CSF level of beta-amyloid 42 is measured using an immunoassay. It is found that participants with severe dementia have a statistically significantly lower mean CSF level of beta-amyloid 42 compared to the other two groups. Which of the following statistical tests was most likely used to compare measurements between the study groups?

Accepted Answer

Analysis of variance

Answer

Chi-square test

Answer

Pearson correlation analysis

Answer

Two-sample t-test

Answer

Fisher's exact test

Study Design — MCQs

Study Design — MCQs

On this page

Practice by Chapter

Want unlimited practice?