Study Design Practice Questions

Q: A biostatistician is processing data for a large clinical trial she is working on. The study is analyzing the use of a novel pharmaceutical compound for the treatment of anorexia after chemotherapy with the outcome of interest being the change in weight while taking the drug. While most participants remained about the same weight or continued to lose weight while on chemotherapy, there were smaller groups of individuals who responded very positively to the orexic agent. As a result, the data had a strong positive skew. The biostatistician wishes to report the measures of central tendency for this project. Just by understanding the skew in the data, which of the following can be expected for this data set?

Mean > median > mode. ***Mean > median > mode*** - In a dataset with a **strong positive skew**, the tail of the distribution is on the right, pulled by a few **unusually large values**. - These extreme high values disproportionately influence the **mean**, pulling it to the right (higher value), while the **median** (middle value) is less affected, and the **mode** (most frequent value) is often located at the peak of the distribution towards the left. *Mean = median = mode* - This relationship between the measures of central tendency is characteristic of a **perfectly symmetrical distribution**, such as a **normal distribution**, where there is no skew. - In a symmetrical distribution, the mean, median, and mode are all located at the exact center of the data. *Mean median = mode* - This configuration is generally not characteristic of standard skewed distributions and would imply a specific, less common bimodal or complex distribution shape where the mode coincides with the median, but the mean is pulled higher. - While theoretically possible, it doesn't describe a typical positively skewed distribution where the mode is usually the lowest of the three. *Mean < median = mode* - This relationship would suggest a negatively skewed distribution where the median and mode are equal, but the mean is pulled to the left (lower value) by a leftward tail. - Again, this is a less typical representation of a standard negatively skewed distribution, which often follows the Mean < Median < Mode pattern.

Q: A scientist is designing a study to determine whether eating a new diet is able to lower blood pressure in a group of patients. In particular, he believes that starting the diet may help decrease peak blood pressures throughout the day. Therefore, he will equip study participants with blood pressure monitors and follow pressure trends over a 24-hour period. He decides that after recruiting subjects, he will start them on either the new diet or a control diet and follow them for 1 month. After this time, he will switch patients onto the other diet and follow them for an additional month. He will analyze the results from the first month against the results from the second month for each patient. This type of study design is best at controlling for which of the following problems with studies?

Confounding. ***Confounding*** - This **crossover design** (switching patients to the other diet) effectively controls for **confounding variables** by making each patient their own control, ensuring that inherent patient characteristics do not bias the comparison between diets. - By comparing the effects of both diets within the same individual, individual variability in factors such as genetics, lifestyle, and other co-morbidities are accounted for, reducing their potential as confounders. *Hawthorne effect* - The **Hawthorne effect** refers to subjects modifying their behavior in response to being observed, which this study design does not specifically address or eliminate. - While patients are being monitored, the design aims to compare the diets' effects, not to prevent behavioral changes due to observation itself. *Recall bias* - **Recall bias** occurs when participants' memories of past events are inaccurate, often influenced by their current health status or beliefs. - This study measures **real-time blood pressure** data, not relying on recollection of past exposures or outcomes, thereby mitigating recall bias. *Selection bias* - **Selection bias** arises from non-random selection of participants into study groups, leading to systematic differences between groups. - While patient recruitment could introduce selection bias into the overall study population, the **crossover design** itself helps control for differences between treatment arms because all participants eventually receive both treatments. *Pygmalion effect* - The **Pygmalion effect** (or observer-expectancy effect) describes phenomena where higher expectations lead to increased performance, usually from a researcher influencing a subject. - This effect is not directly addressed by the crossover design; the design focuses on controlling for patient-specific confounders rather than investigator bias in expectations.

Q: A patient is in the ICU for diabetic ketoacidosis and is currently on an insulin drip. His electrolytes are being checked every hour and his potassium is notable for the following measures: 1. 5.1 mEq/L 2. 5.8 mEq/L 3. 6.1 mEq/L 4. 6.2 mEq/L 5. 5.9 mEq/L 6. 5.1 mEq/L 7. 4.0 mEq/L 8. 3.1 mEq/L Which of the following is the median potassium value of this data set?

5.45. ***5.45*** - To find the **median**, first arrange the potassium values in ascending order: 3.1, 4.0, 5.1, 5.1, 5.8, 5.9, 6.1, 6.2. - Since there are **eight** (an even number) values, the median is the average of the two middle values (the 4th and 5th values): (5.1 + 5.8) / 2 = 10.9 / 2 = **5.45**. *6.05* - This value might be obtained by incorrectly averaging a different pair of numbers or miscalculating the average of the sorted data set. - It is not the correct median for this particular data set of potassium values. *5.10* - While 5.1 is present twice in the data set, and is one of the middle values, it is not the **median** because the **median** for an even number of values is the average of the two middle numbers, not just one of them. - This would be the median if the values were 3.1, 4.0, 5.1, 5.1, 5.1, 5.8, 5.9, 6.1. *5.16* - This value does not correspond to any of the numbers in the data set nor does it result from the correct calculation of the **median**. - It might represent an incorrect average or a miscalculation of a percentile. *3.10* - This value is the **minimum** potassium level recorded, not the median. - The median represents the middle value in a sorted data set, while the minimum is the lowest value.

Q: A 23-year-old woman and her husband come to a genetic counselor because she is concerned about the chance of having an inherited defect if they had a child. Family history reveals no significant family history in her husband; however, her sister had a son who has seizures, failure to thrive, and neurodegeneration. She does not remember the name of the disease but remembers that her nephew had sparse, brittle hair that kinked in odd directions. She does not think that any other members of her family including her sister's husband have had this disorder. If this couple had a son, what is the most likely chance that he would have the same disorder that affected the patient's nephew?

25%. ***25%*** - The nephew's symptoms of **seizures, failure to thrive, neurodegeneration**, and **sparse, brittle, kinky hair** are highly indicative of **Menkes disease**, an **X-linked recessive** disorder. - Since the patient's sister had an affected son, the sister is an **obligate carrier** of the mutation. - The patient and her sister share the same parents, so their mother must be a carrier (or have the mutation). - The patient herself has a **50% chance of being a carrier**. - **If the patient is a carrier**, each son has a **50% chance** of being affected. - **Overall probability**: 0.5 (chance patient is carrier) × 0.5 (chance son inherits mutation) = **0.25 = 25%**. *Close to 0%* - This would only be correct if the patient had no chance of being a carrier, which is not the case given her family history. - Her sister's affected son confirms the mutation is present in the maternal lineage. *100%* - This would only occur if the patient were definitely a carrier AND all male offspring inherited the mutation, or if the disorder were autosomal dominant with complete penetrance. - For **X-linked recessive** disorders, even carrier mothers only pass the mutation to 50% of sons on average. *12.5%* - This percentage might represent additional generational steps or compound probabilities not relevant to this direct parent-child scenario. - The correct calculation for this scenario is 50% × 50% = 25%. *50%* - This would be correct if we knew with certainty that the patient is a carrier. - However, since we only know her sister is a carrier, the patient has a 50% chance of being a carrier herself, making the overall risk 25%. - This is a common error in genetic counseling calculations—forgetting to account for the uncertain carrier status of the at-risk individual.

Q: An investigator is studying nosocomial infections in hospitals. The weekly incidence of hospital-acquired pulmonary infections within the pediatric wards of eight different hospitals is recorded. The incidence rates are: 2, 3, 5, 6, 7, 8, 9, 10. Which of the following values best represents the median value of these incidence rates?

6.5. ***Correct Option: 6.5*** - The given data are 2, 3, 5, 6, 7, 8, 9, 10. Since there are an **even number** (n=8) of observations, the median is the **average of the two middle values**. - The two middle values are the 4th and 5th values in the sorted list: 6 and 7. - Thus, the median is **(6 + 7) / 2 = 6.5**. - This correctly represents the central tendency of the dataset. *Incorrect Option: 6.0* - This is the **4th value** in the ordered dataset, which is one of the two middle values but not the median itself. - For an even number of observations, simply selecting one of the two middle values is incorrect; they must be **averaged** to find the median. - This represents a common error in calculating median for even datasets. *Incorrect Option: 7.0* - This is the **5th value** in the ordered dataset, the other middle value. - Like 6.0, this would only be the median if it were a dataset with an odd number of values where 7 was the single middle value. - For this even set, the median requires **averaging both middle values** (6 and 7). *Incorrect Option: 2.73* - This value appears to be an incorrect calculation or represents a different statistical measure entirely. - This is **not** the geometric mean, mean, or any standard measure of central tendency for this dataset. - The actual mean would be (2+3+5+6+7+8+9+10)/8 = 6.25. *Incorrect Option: 8.0* - This is the **6th value** in the ordered dataset, not representing the central position. - This value is above the median and represents the upper portion of the data distribution. - For a dataset of 8 values, the median position is between the 4th and 5th values, not the 6th.

Q: An epidemiologist is evaluating the efficacy of Noxbinle in preventing HCC deaths at the population level. A clinical trial shows that over 5 years, the mortality rate from HCC was 25% in the control group and 15% in patients treated with Noxbinle 100 mg daily. Based on this data, how many patients need to be treated with Noxbinle 100 mg to prevent, on average, one death from HCC?

10. ***10*** - The **number needed to treat (NNT)** is calculated by first finding the **absolute risk reduction (ARR)**. - **ARR** = Risk in control group - Risk in treatment group = 25% - 15% = **10%** (or 0.10). - **NNT = 1 / ARR** = 1 / 0.10 = **10 patients**. - This means that **10 patients must be treated with Noxbinle to prevent one death from HCC** over 5 years. *20* - This would result from an ARR of 5% (1/0.05 = 20), which is not supported by the data. - May arise from miscalculating the risk difference or incorrectly halving the actual ARR. *73* - This value does not correspond to any standard calculation of NNT from the given mortality rates. - May result from confusion with other epidemiological measures or calculation error. *50* - This would correspond to an ARR of 2% (1/0.02 = 50), which significantly underestimates the actual risk reduction. - Could result from incorrectly calculating the difference as a proportion rather than absolute percentage points. *100* - This would correspond to an ARR of 1% (1/0.01 = 100), grossly underestimating the treatment benefit. - May result from confusing ARR with relative risk reduction or other calculation errors.

Q: A group of environmental health scientists recently performed a nationwide cross-sectional study that investigated the risk of head and neck cancers in patients with a history of cigar and pipe smoking. In collaboration with three teams of epidemiologists that have each conducted similar cross-sectional studies in their respective countries, they have agreed to contribute their data to an international pooled analysis of the relationship between non-cigarette tobacco consumption and prevalence of head and neck cancers. Which of the following statements regarding the pooled analysis in comparison to the individual studies is true?

The likelihood of type II errors is decreased.. ***The likelihood of type II errors is decreased.*** - A pooled analysis or **meta-analysis** combines data from multiple studies, significantly increasing the **overall sample size**. - A larger sample size enhances the statistical power, making it less likely to miss a real effect and thus reducing the probability of **Type II errors** (false negatives). *The results are less precise.* - Combining data from multiple studies in a **pooled analysis** generally leads to **more precise estimates** due to the larger sample size and increased statistical power. - Increased precision is reflected in narrower confidence intervals, offering a more reliable estimate of the effect. *It overcomes limitations in the quality of individual studies.* - A pooled analysis **does not inherently overcome limitations** in the design, methodology, or quality of the individual studies included. - If the original studies have significant biases or flaws, these limitations can be propagated or even amplified in the pooled results. *It is able to provide evidence of causality.* - Pooled analyses of **cross-sectional studies**, like the ones described, can identify **associations** but cannot establish **causality**. - Cross-sectional studies measure exposure and outcome simultaneously, making it impossible to determine the temporal sequence necessary to infer cause and effect. *The level of clinical evidence is lower.* - Combining multiple studies, especially well-conducted ones, in a pooled analysis or **meta-analysis** generally **increases the level of clinical evidence**, placing it higher than individual observational studies. - This is because a pooled analysis offers a more robust and comprehensive view of the existing evidence.

Question 1

A biostatistician is processing data for a large clinical trial she is working on. The study is analyzing the use of a novel pharmaceutical compound for the treatment of anorexia after chemotherapy with the outcome of interest being the change in weight while taking the drug. While most participants remained about the same weight or continued to lose weight while on chemotherapy, there were smaller groups of individuals who responded very positively to the orexic agent. As a result, the data had a strong positive skew. The biostatistician wishes to report the measures of central tendency for this project. Just by understanding the skew in the data, which of the following can be expected for this data set?

Accepted Answer

Mean > median > mode

Answer

Mean = median = mode

Answer

Mean < median < mode

Answer

Mean > median = mode

Answer

Mean < median = mode

Question 2

A scientist is designing a study to determine whether eating a new diet is able to lower blood pressure in a group of patients. In particular, he believes that starting the diet may help decrease peak blood pressures throughout the day. Therefore, he will equip study participants with blood pressure monitors and follow pressure trends over a 24-hour period. He decides that after recruiting subjects, he will start them on either the new diet or a control diet and follow them for 1 month. After this time, he will switch patients onto the other diet and follow them for an additional month. He will analyze the results from the first month against the results from the second month for each patient. This type of study design is best at controlling for which of the following problems with studies?

Accepted Answer

Confounding

Answer

Hawthorne effect

Answer

Recall bias

Answer

Selection bias

Answer

Pygmalion effect

Question 3

A patient is in the ICU for diabetic ketoacidosis and is currently on an insulin drip. His electrolytes are being checked every hour and his potassium is notable for the following measures:

1. 5.1 mEq/L
2. 5.8 mEq/L
3. 6.1 mEq/L
4. 6.2 mEq/L
5. 5.9 mEq/L
6. 5.1 mEq/L
7. 4.0 mEq/L
8. 3.1 mEq/L

Which of the following is the median potassium value of this data set?

Accepted Answer

5.45

Answer

6.05

Answer

5.10

Answer

5.16

Answer

3.10

Question 4

A prospective cohort study was conducted to evaluate the effectiveness of transcatheter aortic valve replacement (TAVR) and surgical aortic valve replacement (SAVR) for treatment of aortic stenosis in adults 65 years of age and older. Three hundred patients who received TAVR and another 300 patients who received SAVR were followed for 5 years and monitored for cardiovascular symptoms and all-cause mortality. The study found that patients who received TAVR had a higher risk of death at the end of a 5-year follow-up period (HR = 1.21, p < 0.001). Later, the researchers performed a subgroup analysis by adjusting their data for ejection fraction. After the researchers compared risk of death between the TAVR and SAVR groups among patients of the same ejection fraction, they found that TAVR was no longer associated with a higher risk of death. They concluded that ejection fraction was a potential confounding variable. Which of the following statements would be most supportive of this conclusion?

Accepted Answer

Ejection fraction influences both probability of receiving TAVR and risk of death

Answer

The prevalence of low ejection fraction is higher in the TAVR group

Answer

Patients who receive TAVR and SAVR have similar ejection fractions

Answer

The increase in risk of death conferred by TAVR is higher in patients with low ejection fraction

Answer

TAVR correlates with increased risk of death, but the magnitude of effect differs based on ejection fraction

Question 5

A 23-year-old woman and her husband come to a genetic counselor because she is concerned about the chance of having an inherited defect if they had a child. Family history reveals no significant family history in her husband; however, her sister had a son who has seizures, failure to thrive, and neurodegeneration. She does not remember the name of the disease but remembers that her nephew had sparse, brittle hair that kinked in odd directions. She does not think that any other members of her family including her sister's husband have had this disorder. If this couple had a son, what is the most likely chance that he would have the same disorder that affected the patient's nephew?

Accepted Answer

25%

Answer

100%

Answer

12.5%

Answer

50%

Answer

Close to 0%

Question 6

An investigator is studying nosocomial infections in hospitals. The weekly incidence of hospital-acquired pulmonary infections within the pediatric wards of eight different hospitals is recorded. The incidence rates are: 2, 3, 5, 6, 7, 8, 9, 10. Which of the following values best represents the median value of these incidence rates?

Accepted Answer

6.5

Answer

8.0

Answer

7.0

Answer

2.73

Answer

6.0

Question 7

An investigator is studying the relationship between suicide and unemployment using data from a national health registry that encompasses 10,000 people who died by suicide, as well as 100,000 matched controls. The investigator finds that unemployment was associated with an increased risk of death by suicide (odds ratio = 3.02; p < 0.001). Among patients with a significant psychiatric history, there was no relationship between suicide and unemployment (p = 0.282). Likewise, no relationship was found between the two variables among patients without a psychiatric history (p = 0.32). These results are best explained by which of the following?

Accepted Answer

Confounding

Answer

Selection bias

Answer

Matching

Answer

Effect modification

Answer

Stratification

Question 8

An epidemiologist is evaluating the efficacy of Noxbinle in preventing HCC deaths at the population level. A clinical trial shows that over 5 years, the mortality rate from HCC was 25% in the control group and 15% in patients treated with Noxbinle 100 mg daily. Based on this data, how many patients need to be treated with Noxbinle 100 mg to prevent, on average, one death from HCC?

Accepted Answer

10

Answer

20

Answer

73

Answer

50

Answer

100

Question 9

A group of environmental health scientists recently performed a nationwide cross-sectional study that investigated the risk of head and neck cancers in patients with a history of cigar and pipe smoking. In collaboration with three teams of epidemiologists that have each conducted similar cross-sectional studies in their respective countries, they have agreed to contribute their data to an international pooled analysis of the relationship between non-cigarette tobacco consumption and prevalence of head and neck cancers. Which of the following statements regarding the pooled analysis in comparison to the individual studies is true?

Accepted Answer

The likelihood of type II errors is decreased.

Answer

The results are less precise.

Answer

It overcomes limitations in the quality of individual studies.

Answer

It is able to provide evidence of causality.

Answer

The level of clinical evidence is lower.

Question 10

You are conducting a study comparing the efficacy of two different statin medications. Two groups are placed on different statin medications, statin A and statin B. Baseline LDL levels are drawn for each group and are subsequently measured every 3 months for 1 year. Average baseline LDL levels for each group were identical. The group receiving statin A exhibited an 11 mg/dL greater reduction in LDL in comparison to the statin B group. Your statistical analysis reports a p-value of 0.052. Which of the following best describes the meaning of this p-value?

Accepted Answer

There is a 5.2% chance of observing a difference in reduction of LDL of 11 mg/dL or greater even if the two medications have identical effects

Answer

There is a 95% chance that the difference in reduction of LDL observed reflects a real difference between the two groups

Answer

Though A is more effective than B, there is a 5% chance the difference in reduction of LDL between the two groups is due to chance

Answer

If 100 permutations of this experiment were conducted, 5 of them would show similar results to those described above

Answer

This is a statistically significant result

Study Design — MCQs

Study Design — MCQs

On this page

Practice by Chapter

Want unlimited practice?