Study Design Practice Questions

Q: You are reading through a recent article that reports significant decreases in all-cause mortality for patients with malignant melanoma following treatment with a novel biological infusion. Which of the following choices refers to the probability that a study will find a statistically significant difference when one truly does exist?

Power. ***Power*** - **Power** is the probability that a study will correctly reject the null hypothesis when it is, in fact, false (i.e., will find a statistically significant difference when one truly exists). - A study with high power minimizes the risk of a **Type II error** (failing to detect a real effect). *Type II error* - A **Type II error** (or **beta error**) occurs when a study fails to reject a false null hypothesis, meaning it concludes there is no significant difference when one actually exists. - This is the **opposite** of what the question describes, which asks for the probability of *finding* a difference. *Type I error* - A **Type I error** (or **alpha error**) occurs when a study incorrectly rejects a true null hypothesis, concluding there is a significant difference when one does not actually exist. - This relates to the **p-value** and the level of statistical significance (e.g., p < 0.05). *Confidence interval* - A **confidence interval** provides a range of values within which the true population parameter is likely to lie with a certain degree of confidence (e.g., 95%). - It does not directly represent the probability of finding a statistically significant difference when one truly exists. *p-value* - The **p-value** is the probability of observing data as extreme as, or more extreme than, that obtained in the study, assuming the null hypothesis is true. - It is used to determine statistical significance, but it is not the probability of detecting a true effect.

Q: On morning labs, a patient's potassium comes back at 5.9 mEq/L. The attending thinks that this result is spurious, and asks the team to repeat the electrolytes. Inadvertently, the medical student, intern, and resident all repeat the electrolytes that same morning. The following values are reported: 4.3 mEq/L, 4.2 mEq/L, and 4.2 mEq/L. What is the median potassium value for that patient that day including the first value?

4.25 mEq/L. ***4.25 mEq/L*** - The question asks for the median including **all four potassium values**: 5.9, 4.3, 4.2, and 4.2 mEq/L. - To find the **median**, first arrange the values in ascending order: **4.2, 4.2, 4.3, 5.9**. - With an **even number of values (4)**, the median is the **average of the two middle numbers**: (4.2 + 4.3) / 2 = **4.25 mEq/L**. - This correctly represents the **central tendency** of all laboratory values obtained that day. *4.3 mEq/L* - This is the **third value** in the sorted dataset (4.2, 4.2, 4.3, 5.9). - This would be the median if there were an **odd number of values**, where you would simply take the middle value. - With an even number of data points, you must **average the two middle values** (4.2 and 4.3), not select just one. *4.65 mEq/L* - This value (4.65) would result from incorrectly averaging **4.3 and 5.9**, perhaps by mistakenly identifying these as the two middle values. - This could also result from averaging the **minimum (4.2) and maximum (5.9)** values: (4.2 + 5.9) / 2 = 5.05, though neither calculation yields exactly 4.65. - The median requires proper sorting and identification of the **true middle position(s)** in the dataset. *1.7 mEq/L* - This value has **no mathematical relationship** to the given data (5.9, 4.3, 4.2, 4.2 mEq/L). - This is a distractor representing **severe hypokalemia**, which is not supported by any of the laboratory values obtained. - This might represent the **range** (5.9 - 4.2 = 1.7), though range is typically reported as a difference, not a standalone value. *4.2 mEq/L* - This is the **mode** of the dataset (the most frequently occurring value, appearing three times). - While mode is a valid measure of central tendency, the question specifically asks for the **median**, not the mode. - The median of this dataset (4.2, 4.2, 4.3, 5.9) is **4.25 mEq/L**, not 4.2 mEq/L.

Q: A 24-year-old woman presents to a medical office for a follow-up evaluation. The medical history is significant for type 1 diabetes, for which she takes insulin. She was recently hospitalized for diabetic ketoacidosis following a respiratory infection. Today she brings in a list of her most recent early morning fasting blood glucose readings for review. Her glucose readings range from 126 mg/dL–134 mg/dL, except for 2 readings of 350 mg/dL and 380 mg/dL, taken at the onset of her recent hospitalization. Given this data set, which measure(s) of central tendency would be most likely affected by these additional extreme values?

Mean. ***Mean*** * The **mean** is calculated by summing all values and dividing by the total number of values; thus, it is significantly influenced by **extreme values** or outliers. * The two high blood glucose readings (350 mg/dL and 380 mg/dL) will **disproportionately increase** the mean, pulling it away from the central tendency of the majority of readings. * *Median and mode* * The **mode** is the most frequent value, which would likely still be within the 126-134 mg/dL range since most readings fall there, and the **median** (the middle value) is less affected by outliers. * Even with two extreme values, the median of this dataset, assuming several readings in the 126-134 mg/dL range, would remain close to the central cluster of typical values and not be drastically altered. * *Median* * The **median** is resistant to outliers because it is determined by the position of values once ordered, not their magnitude. * Adding a few extreme values will only shift the median slightly, if at all, especially if the sample size is large enough that the middle position remains within the range of typical values. * *Mean and median* * While the **mean** is heavily affected by outliers, the **median** is relatively robust to them. * Therefore, stating that both would be significantly affected is incorrect because the median would largely retain its representation of the central tendency. * *Mode* * The **mode** represents the most frequently occurring value in a dataset and is not influenced by the magnitude of extreme values. * Unless one of the extreme high readings happens to be the most frequently occurring value, the mode would remain within the range of the more common, lower glucose readings.

Q: A study is being conducted on depression using the Patient Health questionnaire (PHQ-9) survey data embedded within a popular social media network with a response size of 500,000 participants. The sample population of this study is approximately normal. The mean PHQ-9 score is 14, and the standard deviation is 4. How many participants have scores greater than 22?

12,500. ***12,500*** - To find the number of participants with scores greater than 22, first calculate the **z-score** for a score of 22: $Z = \frac{(X - \mu)}{\sigma} = \frac{(22 - 14)}{4} = 2$. - A z-score of 2 means the score is **2 standard deviations above the mean**. Using the **empirical rule** for a normal distribution, approximately **2.5%** of the data falls beyond 2 standard deviations above the mean (5% total in both tails, so 2.5% in each tail). - Therefore, $2.5\%$ of the total 500,000 participants is $0.025 \times 500,000 = 12,500$. *175,000* - This option would imply a much larger proportion of the population scoring above 22, inconsistent with the **normal distribution's properties** and the calculated z-score. - It would correspond to a z-score closer to 0, indicating a score closer to the mean, not two standard deviations above it. *17,500* - This value represents **3.5%** of the total population ($17,500 / 500,000 = 0.035$). - A proportion of 3.5% above the mean corresponds to a z-score that is not exactly 2, indicating an incorrect calculation or interpretation of the **normal distribution table**. *160,000* - This option represents a very large portion of the participants, roughly **32%** of the total population. - This percentage would correspond to scores within one standard deviation of the mean, not scores 2 standard deviations above the mean as calculated. *25,000* - This value represents **5%** of the total population ($25,000 / 500,000 = 0.05$). - A z-score greater than 2 corresponds to the far tail of the normal distribution, where only 2.5% of the data lies, not 5%. This would correspond to a z-score of approximately 1.65.

Q: A cross-sectional study is investigating the association between smoking and the presence of Raynaud phenomenon in adults presenting to a primary care clinic in a major city. A standardized 3-question survey that assesses symptoms of Raynaud phenomenon was used to clinically diagnosis patients if they answered positively to all 3 questions. Sociodemographics, health-related information, and smoking history were collected by trained interviewers. Subjects were grouped by their reported tobacco use: non-smokers, less than 1 pack per day (PPD), between 1-2 PPD, and over 2 PPD. The results were adjusted for gender, age, education, and alcohol consumption. The adjusted odds ratios (OR) were as follows: Non-smoker: OR = reference 2 PPD: OR = 2.21 [95% CI, 2.14-2.37] Which of the following is represented in this study and suggests a potential causal relationship between smoking and Raynaud phenomenon?

Dose-response. ***Dose-response*** - The study demonstrates a **dose-response relationship** as the odds ratio for Raynaud phenomenon increases with the reported packs per day (PPD) of tobacco use. - This graded effect, where a higher exposure (more smoking) leads to a stronger outcome (higher odds of Raynaud phenomenon), is a strong indicator of a potential causal link according to the Bradford Hill criteria. *Confounding* - **Confounding** occurs when a third variable influences both the exposure and the outcome, creating a spurious association. - The study specifically states that the results were **adjusted for gender, age, education, and alcohol consumption**, indicating an attempt to control for potential confounders, rather than confounding itself being represented as a causal link. *Blinding* - **Blinding** involves preventing participants or researchers from knowing who is receiving a particular treatment or exposure to reduce bias. - While important in some study designs, this cross-sectional study describes **collected data** and adjusted odds ratios, not a process of blinding. *Consistency* - **Consistency** refers to the repeated observation of an association in different studies, populations, or circumstances. - This study presents its own findings without reference to other research, so it does not demonstrate consistency; rather, it provides a single observation. *Temporality* - **Temporality** (or temporal relationship) means that the exposure must precede the outcome for a causal relationship to exist. - This is a **cross-sectional study**, which assesses both exposure (smoking) and outcome (Raynaud phenomenon) at the same time, making it difficult to definitively establish temporality.

Q: A research team develops a new monoclonal antibody checkpoint inhibitor for advanced melanoma that has shown promise in animal studies as well as high efficacy and low toxicity in early phase human clinical trials. The research team would now like to compare this drug to existing standard of care immunotherapy for advanced melanoma. The research team decides to conduct a non-randomized study where the novel drug will be offered to patients who are deemed to be at risk for toxicity with the current standard of care immunotherapy, while patients without such risk factors will receive the standard treatment. Which of the following best describes the level of evidence that this study can offer?

Level 3. ***Level 3*** - A **non-randomized controlled trial** like the one described, where patient assignment to treatment groups is based on specific characteristics (risk of toxicity), falls into Level 3 evidence. - This level typically includes **non-randomized controlled trials** and **well-designed cohort studies** with comparison groups, which are prone to selection bias and confounding. - The study compares two treatments but lacks randomization, making it Level 3 evidence. *Level 1* - Level 1 evidence is the **highest level of evidence**, derived from **systematic reviews and meta-analyses** of multiple well-designed randomized controlled trials or large, high-quality randomized controlled trials. - The described study is explicitly stated as non-randomized, ruling out Level 1. *Level 2* - Level 2 evidence involves at least one **well-designed randomized controlled trial** (RCT) or **systematic reviews** of randomized trials. - The current study is *non-randomized*, which means it cannot be classified as Level 2 evidence, as randomization is a key criterion for this level. *Level 4* - Level 4 evidence includes **case series**, **case-control studies**, and **poorly designed cohort or case-control studies**. - While the study is non-randomized, it is a controlled comparative trial rather than a case series or retrospective case-control study, placing it at Level 3. *Level 5* - Level 5 evidence is the **lowest level of evidence**, typically consisting of **expert opinion** without explicit critical appraisal, or based on physiology, bench research, or animal studies. - While the drug was initially tested in animal studies, the current human comparative study offers a higher level of evidence than expert opinion or preclinical data.

Q: A study aimed to evaluate the relationship between inflammatory markers and lipid metabolism in individuals with rheumatoid arthritis (RA) recruited 252 patients with RA in a tertiary care hospital. Fasting blood samples were taken for lipid profiling and for the assessment of inflammatory markers such as C-reactive protein (CRP) and erythrocyte sedimentation rate. The relationship between CRP and total cholesterol was assessed using Pearson’s correlation coefficient. A scatter plot between CRP and total cholesterol can be seen in the picture. Based on the scatter plot, which of the following can be correctly concluded about the value of the Pearson correlation coefficient, r, for CRP and total cholesterol?

r value lies between 0 and -1. ***r value lies between 0 and -1*** - The scatter plot shows a **negative association** between CRP and total cholesterol meaning as CRP levels increase, total cholesterol tends to decrease, indicating a negative correlation. - The data points are somewhat scattered and do not form a perfect straight line, so the correlation is not exactly -1 but falls between **0 and -1**. *r value is exactly +1* - An r-value of **+1** would indicate a **perfect positive linear relationship**, where all data points fall precisely on a straight line that slopes upwards meaning as CRP increases total cholesterol also increases. - The scatter plot clearly shows a downward trend, which contradicts a positive correlation. *r value is exactly 0* - An r-value of **0** would suggest **no linear relationship** between the variables, meaning the points would be randomly scattered with no discernible trend. - The scatter plot demonstrates a clear trend, albeit a negative one, indicating that there is a relationship between CRP and total cholesterol. *r value is exactly -1* - An r-value of **-1** would signify a **perfect negative linear relationship**, where all data points would lie exactly on a straight line that slopes downwards. - While there is a negative trend, the data points show significant scattering, indicating that the relationship is not perfectly linear. *r value lies between 0 and +1* - An r-value between **0 and +1** would imply a **positive, but not perfect, linear relationship**. - This is incorrect because the scatter plot visually depicts a **negative relationship**, where one variable tends to decrease as the other increases.

Q: The incidence of a relatively benign autosomal recessive disease, X, is 1 in 25 in the population. Assuming that the conditions for Hardy Weinberg Equilibrium are met, what is the probability that a male and female, who are carriers, will have a child expressing the disease?

1/4. ***1/4*** - If both parents are **carriers** for an autosomal recessive disease, each parent has one copy of the normal allele (A) and one copy of the recessive allele (a). - When two heterozygous (Aa) individuals mate, the probability of their child inheriting two recessive alleles (aa) and expressing the disease is 1 in 4 (25%), according to Mendelian genetics. *1/5* - This value represents the **allele frequency (q)** in the population for the recessive allele, given an incidence of 1 in 25 (q^2 = 1/25, so q = 1/5). - However, this is not the probability of a child being affected if both parents are already known to be carriers. *8/25* - This option is incorrect and does not directly relate to the probability of an affected child from two known carriers. - It might represent a miscalculation involving carrier frequencies or a different genetic scenario. *1/25* - This is the **incidence of the disease (q^2)** in the general population, which means 1 out of 25 individuals express the disease. - It is not the probability of a child inheriting the disease from two parents already identified as carriers. *4/5* - This value represents the **allele frequency (p)** of the dominant allele (p = 1 - q = 1 - 1/5 = 4/5). - It is not the probability of a child expressing the disease from two carrier parents.

Question 1

You are reading through a recent article that reports significant decreases in all-cause mortality for patients with malignant melanoma following treatment with a novel biological infusion. Which of the following choices refers to the probability that a study will find a statistically significant difference when one truly does exist?

Accepted Answer

Power

Answer

Type II error

Answer

Type I error

Answer

Confidence interval

Answer

p-value

Question 2

On morning labs, a patient's potassium comes back at 5.9 mEq/L. The attending thinks that this result is spurious, and asks the team to repeat the electrolytes. Inadvertently, the medical student, intern, and resident all repeat the electrolytes that same morning. The following values are reported: 4.3 mEq/L, 4.2 mEq/L, and 4.2 mEq/L. What is the median potassium value for that patient that day including the first value?

Accepted Answer

4.25 mEq/L

Answer

4.3 mEq/L

Answer

4.65 mEq/L

Answer

1.7 mEq/L

Answer

4.2 mEq/L

Question 3

A 24-year-old woman presents to a medical office for a follow-up evaluation. The medical history is significant for type 1 diabetes, for which she takes insulin. She was recently hospitalized for diabetic ketoacidosis following a respiratory infection. Today she brings in a list of her most recent early morning fasting blood glucose readings for review. Her glucose readings range from 126 mg/dL–134 mg/dL, except for 2 readings of 350 mg/dL and 380 mg/dL, taken at the onset of her recent hospitalization. Given this data set, which measure(s) of central tendency would be most likely affected by these additional extreme values?

Accepted Answer

Mean

Answer

Median and mode

Answer

Median

Answer

Mean and median

Answer

Mode

Question 4

A study is being conducted on depression using the Patient Health questionnaire (PHQ-9) survey data embedded within a popular social media network with a response size of 500,000 participants. The sample population of this study is approximately normal. The mean PHQ-9 score is 14, and the standard deviation is 4. How many participants have scores greater than 22?

Accepted Answer

12,500

Answer

175,000

Answer

17,500

Answer

160,000

Answer

25,000

Question 5

A cross-sectional study is investigating the association between smoking and the presence of Raynaud phenomenon in adults presenting to a primary care clinic in a major city. A standardized 3-question survey that assesses symptoms of Raynaud phenomenon was used to clinically diagnosis patients if they answered positively to all 3 questions. Sociodemographics, health-related information, and smoking history were collected by trained interviewers. Subjects were grouped by their reported tobacco use: non-smokers, less than 1 pack per day (PPD), between 1-2 PPD, and over 2 PPD. The results were adjusted for gender, age, education, and alcohol consumption. The adjusted odds ratios (OR) were as follows:

Non-smoker: OR = reference
<1 PPD: OR = 1.49 [95% confidence interval (CI), 1.24-1.79]
1-2 PPD: OR = 1.91 [95% CI, 1.72-2.12]
>2 PPD: OR = 2.21 [95% CI, 2.14-2.37]

Which of the following is represented in this study and suggests a potential causal relationship between smoking and Raynaud phenomenon?

Accepted Answer

Dose-response

Answer

Confounding

Answer

Blinding

Answer

Consistency

Answer

Temporality

Question 6

A research team develops a new monoclonal antibody checkpoint inhibitor for advanced melanoma that has shown promise in animal studies as well as high efficacy and low toxicity in early phase human clinical trials. The research team would now like to compare this drug to existing standard of care immunotherapy for advanced melanoma. The research team decides to conduct a non-randomized study where the novel drug will be offered to patients who are deemed to be at risk for toxicity with the current standard of care immunotherapy, while patients without such risk factors will receive the standard treatment. Which of the following best describes the level of evidence that this study can offer?

Accepted Answer

Level 3

Answer

Level 1

Answer

Level 5

Answer

Level 4

Answer

Level 2

Question 7

During a clinical study on an island with a population of 2540 individuals, 510 are found to have fasting hyperglycemia. Analysis of medical records of deceased individuals shows that the average age of onset of fasting hyperglycemia is 45 years, and the average life expectancy is 70 years. Assuming a steady state of population on the island with no change in environmental risk factors, which of the following is the best estimate of the number of individuals who would newly develop fasting hyperglycemia over 1 year?

Accepted Answer

20

Answer

50

Answer

10

Answer

30

Answer

40

Question 8

A study aimed to evaluate the relationship between inflammatory markers and lipid metabolism in individuals with rheumatoid arthritis (RA) recruited 252 patients with RA in a tertiary care hospital. Fasting blood samples were taken for lipid profiling and for the assessment of inflammatory markers such as C-reactive protein (CRP) and erythrocyte sedimentation rate. The relationship between CRP and total cholesterol was assessed using Pearson’s correlation coefficient. A scatter plot between CRP and total cholesterol can be seen in the picture. Based on the scatter plot, which of the following can be correctly concluded about the value of the Pearson correlation coefficient, r, for CRP and total cholesterol?

Accepted Answer

r value lies between 0 and -1

Answer

r value is exactly +1

Answer

r value is exactly 0

Answer

r value is exactly -1

Answer

r value lies between 0 and +1

Question 9

The incidence of a relatively benign autosomal recessive disease, X, is 1 in 25 in the population. Assuming that the conditions for Hardy Weinberg Equilibrium are met, what is the probability that a male and female, who are carriers, will have a child expressing the disease?

Accepted Answer

1/4

Answer

1/5

Answer

8/25

Answer

1/25

Answer

4/5

Question 10

A 21-year-old woman is diagnosed with a rare subtype of anti-NMDA encephalitis. During the diagnostic workup, she was found to have an ovarian teratoma. Her physician is curious about the association between anti-NMDA encephalitis and ovarian teratomas. A causal relationship between this subtype of anti-NMDA encephalitis and ovarian teratomas is suspected. The physician aims to identify patients with anti-NMDA encephalitis and subsequently evaluate them for the presence of ovarian teratomas. Which type of study design would be the most appropriate?

Accepted Answer

Retrospective cohort study

Answer

Case-control study

Answer

Cross-sectional study

Answer

Case series

Answer

Randomized controlled trial

Study Design — MCQs

Study Design — MCQs

On this page

Practice by Chapter

Want unlimited practice?