Study Design Practice Questions

Q: A 4th grade class in Salem, Massachusetts has 20 students. Due to recent media coverage of the fallacious association between vaccines and autism, none of the students have been immunized against influenza this year. Fortunately, up to this point none of the students has come down with the flu. During the first week of flu season, however, 2 students contract influenza. In the second week, 3 more students contract influenza. And in the third week, 5 more students contract influenza. The other students remained healthy throughout the rest of the flu season. In this class, what was the risk of contracting influenza during the second week of the flu season?

0.17. ***0.17*** - To calculate the risk during the second week, we need the number of new cases in that week (3 students) and the number of **at-risk individuals** at the beginning of that week. - At the start of the second week, 18 students were at risk (20 total - 2 who contracted flu in the first week). Therefore, risk = 3/18 = **0.1666**, which rounds to **0.17**. - This correctly applies the formula: **Risk = (new cases) / (population at risk at start of period)**. *0.1* - This value would imply 2 new cases out of 20 students, or similar miscalculation. - This does not correctly account for the **dynamically changing population at risk** and uses wrong numerator or denominator. *0.15* - This incorrectly uses 20 as the denominator (3/20 = 0.15), failing to exclude the 2 students who already had influenza. - The **population at risk must exclude those already diseased** at the start of the time period. *0.25* - This fraction could represent 5 new cases out of 20 total students, or 3 new cases out of 12 students. - This answer does not reflect the **specific incidence** during the second week with the correct denominator. *0.5* - This would mean half of the population contracted influenza, which is significantly higher than the observed 3 new cases in the second week. - This value is a gross **overestimation of the actual risk** during the specified period.

Q: You are interested in tracking the disease burden of a highly contagious viral disease over a time period of 5 years. The virus appears to be indigenous to rural parts of northern Africa. Which of the following research study designs would be optimal for your analysis?

Cohort study. ***Cohort study*** - A **cohort study** allows for tracking disease incidence and progression over a defined period (5 years) in a group of individuals (cohort) exposed to conditions in rural Northern Africa, making it optimal for assessing disease burden over time. - This design is ideal for investigating the natural history of a disease and identifying risk factors within a specific population. *Case series* - A **case series** describes characteristics of a group of patients with a particular disease and is useful for hypothesis generation rather than tracking disease burden over time. - It lacks a comparison group, making it unsuitable for assessing incidence or prevalence in a population. *Case-control* - A **case-control study** compares individuals with a disease (cases) to individuals without the disease (controls) and looks retrospectively for exposure differences to identify risk factors. - This design is efficient for rare diseases but less suitable for tracking overall disease burden or incidence trends over a long period. *Cross-sectional* - A **cross-sectional study** measures the prevalence of disease and exposure at a single point in time, providing a snapshot of the population. - While useful for prevalence, it cannot establish temporality or track changes in disease burden over a 5-year period. *Randomized controlled trial* - A **randomized controlled trial (RCT)** is designed to evaluate the effectiveness of an intervention by randomly assigning participants to treatment or control groups. - This design is unethical and impractical for tracking the natural disease burden of an indigenous viral disease in a population.

Q: A researcher faces the task of calculating the mean height of male students in an undergraduate class containing a total of 2,000 male students and 1,750 female students. The mean height of a sample of male students is computed as 176 cm (69.3 in), with a standard deviation of 7 cm (2.8 in). The researcher now tries to calculate the confidence interval for the mean height of the male students in the undergraduate class. Which additional data will be needed for this calculation?

Total sample size of the study. ***Total sample size of the study*** - To calculate the **confidence interval**, one needs the **sample mean**, **standard deviation**, and critically, the **sample size (n)**. - The sample size is crucial because it influences the **standard error of the mean** and thus the width of the confidence interval. *The mean height of all the male students in the undergraduate class* - This value represents the **true population mean**, which is precisely what the confidence interval is trying to **estimate**. - If this value were known, there would be no need to calculate a confidence interval for it. *The given data are adequate, and no more data are needed.* - While the **sample mean** and **standard deviation** are provided, the problem statement does not explicitly state the **sample size (n)** of male students from which these statistics were derived. - The number of all male students (2,000) is the **population size**, not the sample size used for the calculation. *Total number of male students in the undergraduate class who did not take part in the study* - This information is not directly used in the calculation of a **confidence interval** for the mean. - It relates to the part of the population that was not sampled, which doesn't impact the formula for the confidence interval itself. *A sampling frame of all of the male students in the undergraduate class* - A **sampling frame** is a list of all individuals in the population from which a sample can be drawn; it's essential for the **sampling process** itself. - However, once the sample mean and standard deviation are obtained, the sampling frame is not directly needed for the *calculation* of the confidence interval.

Q: You are conducting a systematic review on the effect of a new sulfonylurea for the treatment of type II diabetes. For your systematic review you would like to include 95% confidence intervals for the mean of blood glucose levels in the treatment groups. What further information is necessary to abstract from each of the original papers in order to calculate a 95% confidence interval for each study?

Standard deviation, mean, sample size. ***Standard deviation, mean, sample size*** - To calculate a **95% confidence interval** for the mean, you need the **sample mean**, the **standard deviation** (which quantifies data variability), and the **sample size** (the number of observations). - The formula for a confidence interval for the mean involves these three components and a z-score or t-score corresponding to the desired confidence level. *Power, standard deviation, mean* - **Power** is related to the probability of correctly rejecting a false null hypothesis and is not directly used in the calculation of a confidence interval for a single mean. - While **standard deviation** and **mean** are necessary, **sample size** is also crucial for the calculation, which is missing from this option. *Power, mean, sample size* - **Power** is a concept relevant to study design and hypothesis testing, not for calculating a confidence interval for an observed mean. - While **mean** and **sample size** are correctly identified, the **standard deviation** is a critical missing component needed to quantify the variability around the mean. *Standard deviation, mean, sample size, power* - While **standard deviation**, **mean**, and **sample size** are all needed for calculating the confidence interval, **power** is not required for this specific calculation. - Including **power** as a necessary piece of information is incorrect because it relates to the study's ability to detect an effect, not the precision of an estimated mean. *Power, standard deviation, sample size* - This option incorrectly includes **power**, which is not needed for calculating a confidence interval for the mean. - It also omits the **mean** itself, which is a fundamental component of the confidence interval formula as the central estimate.

Q: A study looking to examine the utility of colorectal cancer screening in patients younger than 50 is currently seeking subjects to enroll. A 49-year-old man with a family history of colorectal cancer is very interested in enrolling in the study, due to his own personal concerns about developing cancer. If enrolled in this study, which of the following types of biases will this represent?

Selection bias. ***Selection bias*** - This scenario exemplifies **selection bias** because the individual actively seeks to participate in the study due to personal concerns and a **family history of colorectal cancer**. This means the study participants may not be representative of the general population younger than 50, potentially skewing the results to show a higher prevalence or different screening utility than would be found in a genuinely random sample. - **Selection bias** occurs when the selection of subjects for a study (or their retention in the study) results in a sample that is not truly representative of the target population. *Recall bias* - **Recall bias** occurs when subjects with a particular condition (e.g., CRC) are more likely to remember exposures or risk factors than healthy controls. - This bias is typically a problem in **retrospective studies** where subjects are asked to recall past events. *Measurement bias* - **Measurement bias** arises from flaws in the way data is collected or measured, leading to systematically inaccurate results. - Examples include using **faulty equipment** or inconsistent methods for assessing outcomes or exposures, leading to misclassification. *Length bias* - **Length bias** in screening refers to the fact that screening tests are more likely to detect cases of disease that are **slower-growing** and have a longer preclinical phase. - This can make screened populations appear to have a better prognosis, as the more aggressive, fast-growing cases are often missed between screening intervals. *Lead-time bias* - **Lead-time bias** refers to the apparent increase in survival time among screened individuals due to the **earlier detection of disease** by screening, rather than an actual prolongation of life. - It occurs when the time from diagnosis to death is artificially lengthened because the disease was found earlier, even if the actual date of death remains unchanged.

Q: A data analyst is putting systolic blood pressure values into a spreadsheet for a research study on hypertension during pregnancy. The majority of systolic blood pressure values fall between 130 and 145. For one of the study participants, she accidentally types “1400” instead of “140”. Which of the following statements is most likely to be correct?

The median is now smaller than the mean. ***The median is now smaller than the mean*** - A single, exceptionally high value like 1400 (**outlier**) will **inflate the mean** significantly, as the mean is sensitive to extreme values. - The median, being the middle value in a sorted dataset, is **resistant to outliers** and will remain relatively unchanged, thus becoming smaller relative to the inflated mean. *The mode is now greater than the mean* - The **mode** is the most frequently occurring value, which would still be in the 130-145 range, and is unlikely to be greater than the heavily inflated mean. - While the mean is significantly increased by the outlier, the mode is driven by the majority of data points and is thus largely unaffected, making it highly improbable to exceed the inflated mean. *The range of the data set is unaffected* - The **range** is the difference between the maximum and minimum values. Replacing 140 with 1400 would dramatically *increase* the maximum value, thereby significantly **increasing the range** of the data set. - The incorrect entry of '1400' creates a new maximum value, directly altering the range. *This is a systematic error* - A **systematic error** is a consistent, repeatable error that biases measurements in a predictable way (e.g., a consistently miscalibrated instrument). - Typing "1400" instead of "140" is a **random transcription error** or a **gross error**, not a consistent bias and therefore is not a systematic error. *The standard deviation of the data set is decreased* - **Standard deviation** measures the spread or dispersion of data points. An extremely high outlier like 1400 will significantly **increase the variability** within the dataset. - This increased variability, due to one data point being very far from the mean, will lead to a substantial *increase* in the standard deviation, not a decrease.

Question 1

A 4th grade class in Salem, Massachusetts has 20 students. Due to recent media coverage of the fallacious association between vaccines and autism, none of the students have been immunized against influenza this year. Fortunately, up to this point none of the students has come down with the flu. During the first week of flu season, however, 2 students contract influenza. In the second week, 3 more students contract influenza. And in the third week, 5 more students contract influenza. The other students remained healthy throughout the rest of the flu season. In this class, what was the risk of contracting influenza during the second week of the flu season?

Accepted Answer

0.17

Answer

0.1

Answer

0.5

Answer

0.15

Answer

0.25

Question 2

You are interested in tracking the disease burden of a highly contagious viral disease over a time period of 5 years. The virus appears to be indigenous to rural parts of northern Africa. Which of the following research study designs would be optimal for your analysis?

Accepted Answer

Cohort study

Answer

Case series

Answer

Case-control

Answer

Cross-sectional

Answer

Randomized controlled trial

Question 3

A researcher faces the task of calculating the mean height of male students in an undergraduate class containing a total of 2,000 male students and 1,750 female students. The mean height of a sample of male students is computed as 176 cm (69.3 in), with a standard deviation of 7 cm (2.8 in). The researcher now tries to calculate the confidence interval for the mean height of the male students in the undergraduate class. Which additional data will be needed for this calculation?

Accepted Answer

Total sample size of the study

Answer

The mean height of all the male students in the undergraduate class

Answer

The given data are adequate, and no more data are needed.

Answer

Total number of male students in the undergraduate class who did not take part in the study

Answer

A sampling frame of all of the male students in the undergraduate class

Question 4

You are conducting a systematic review on the effect of a new sulfonylurea for the treatment of type II diabetes. For your systematic review you would like to include 95% confidence intervals for the mean of blood glucose levels in the treatment groups. What further information is necessary to abstract from each of the original papers in order to calculate a 95% confidence interval for each study?

Accepted Answer

Standard deviation, mean, sample size

Answer

Power, standard deviation, mean

Answer

Power, mean, sample size

Answer

Standard deviation, mean, sample size, power

Answer

Power, standard deviation, sample size

Question 5

A study looking to examine the utility of colorectal cancer screening in patients younger than 50 is currently seeking subjects to enroll. A 49-year-old man with a family history of colorectal cancer is very interested in enrolling in the study, due to his own personal concerns about developing cancer. If enrolled in this study, which of the following types of biases will this represent?

Accepted Answer

Selection bias

Answer

Recall bias

Answer

Measurement bias

Answer

Length bias

Answer

Lead-time bias

Question 6

A new antihypertensive medication is studied in 3,000 Caucasian men with coronary heart disease who are over age 65. The results show benefits in terms of improved morbidity and mortality as well as a decreased rate of acute coronary events with minimal side effects. After hearing about this new medication and supporting study at a recent continuing education course, a family physician elects to prescribe this medication to a 39-year-old Hispanic female who presents with primary hypertension. After a one month trial and appropriate adjustments in the dosing, the patient's blood pressure is not well controlled by this medication. Which of the following statistical concepts could explain this patient's poor response to the medication?

Accepted Answer

Generalizability

Answer

Effect modification

Answer

Observer bias

Answer

Selection bias

Answer

Confounding

Question 7

You are trying to design a randomized controlled trial to evaluate the effectiveness of metoprolol in patients with heart failure. In preparing for the statistical analysis, you review some common types of statistical errors. Which of the following is true regarding a type 1 error in a clinical study?

Accepted Answer

A type 1 error occurs when the null hypothesis is true but is rejected in error.

Answer

A type 1 error is a beta (β) error and is usually 0.1 or 0.2.

Answer

A type 1 error is dependent on the confidence interval of a study.

Answer

A type 1 error means the study is not significantly powered to detect a true difference between study groups.

Answer

A type 1 error occurs when the null hypothesis is false, yet is accepted in error.

Question 8

A data analyst is putting systolic blood pressure values into a spreadsheet for a research study on hypertension during pregnancy. The majority of systolic blood pressure values fall between 130 and 145. For one of the study participants, she accidentally types “1400” instead of “140”. Which of the following statements is most likely to be correct?

Accepted Answer

The median is now smaller than the mean

Answer

The mode is now greater than the mean

Answer

The range of the data set is unaffected

Answer

This is a systematic error

Answer

The standard deviation of the data set is decreased

Question 9

A 34-year-old woman, gravida 1, para 0, at 18 weeks' gestation, comes to the physician for a prenatal visit. She recently read about a genetic disorder that manifests with gait ataxia, kyphoscoliosis, and arrhythmia and is concerned about the possibility of her child inheriting the disease. There is no personal or family history of this disorder. The frequency of unaffected carriers in the general population is 1/100. Assuming the population is in a steady state without selection, what is the probability that her child will develop this disease?

Accepted Answer

1/40,000

Answer

1/10,000

Answer

1/20,000

Answer

1/200

Answer

1/400

Question 10

Study X examined the relationship between coffee consumption and lung cancer. The authors of Study X retrospectively reviewed patients' reported coffee consumption and found that drinking greater than 6 cups of coffee per day was associated with an increased risk of developing lung cancer. However, Study X was criticized by the authors of Study Y. Study Y showed that increased coffee consumption was associated with smoking. What type of bias affected Study X, and what study design is geared to reduce the chance of that bias?

Accepted Answer

Confounding; randomization

Answer

Observer bias; double blind analysis

Answer

Selection bias; randomization

Answer

Lead time bias; placebo

Answer

Measurement bias; blinding

Study Design — MCQs

Study Design — MCQs

On this page

Practice by Chapter

Want unlimited practice?