Sample size for equivalence trials US Medical PG Practice Questions and MCQs
Practice US Medical PG questions for Sample size for equivalence trials. These multiple choice questions (MCQs) cover important concepts and help you prepare for your exams.
Sample size for equivalence trials US Medical PG Question 1: A study is funded by the tobacco industry to examine the association between smoking and lung cancer. They design a study with a prospective cohort of 1,000 smokers between the ages of 20-30. The length of the study is five years. After the study period ends, they conclude that there is no relationship between smoking and lung cancer. Which of the following study features is the most likely reason for the failure of the study to note an association between tobacco use and cancer?
- A. Late-look bias
- B. Latency period (Correct Answer)
- C. Confounding
- D. Effect modification
- E. Pygmalion effect
Sample size for equivalence trials Explanation: ***Latency period***
- **Lung cancer** typically has a **long latency period**, often **20-30+ years**, between initial exposure to tobacco carcinogens and the development of clinically detectable disease.
- A **five-year study duration** in young smokers (ages 20-30) is **far too short** to observe the development of lung cancer, which explains the false negative finding.
- This represents a **fundamental flaw in study design** rather than a bias—the biological timeline of disease development was not adequately considered.
*Late-look bias*
- **Late-look bias** occurs when a study enrolls participants who have already survived the early high-risk period of a disease, leading to **underestimation of true mortality or incidence**.
- Also called **survival bias**, it involves studying a population that has already been "selected" by survival.
- This is not applicable here, as the study simply ended before sufficient time elapsed for disease to develop.
*Confounding*
- **Confounding** occurs when a third variable is associated with both the exposure and outcome, distorting the apparent relationship between them.
- While confounding can affect study results, it would not completely eliminate the detection of a strong, well-established association like smoking and lung cancer in a properly conducted prospective cohort study.
- The issue here is temporal (insufficient follow-up time), not the presence of an unmeasured confounder.
*Effect modification*
- **Effect modification** (also called interaction) occurs when the magnitude of an association between exposure and outcome differs across levels of a third variable.
- This represents a **true biological phenomenon**, not a study design flaw or bias.
- It would not explain the complete failure to detect any association.
*Pygmalion effect*
- The **Pygmalion effect** (observer-expectancy effect) refers to a psychological phenomenon where higher expectations lead to improved performance in the observed subjects.
- This concept is relevant to **behavioral and educational research**, not to objective epidemiological studies of disease incidence.
- It has no relevance to the biological relationship between carcinogen exposure and cancer development.
Sample size for equivalence trials US Medical PG Question 2: You are conducting a study comparing the efficacy of two different statin medications. Two groups are placed on different statin medications, statin A and statin B. Baseline LDL levels are drawn for each group and are subsequently measured every 3 months for 1 year. Average baseline LDL levels for each group were identical. The group receiving statin A exhibited an 11 mg/dL greater reduction in LDL in comparison to the statin B group. Your statistical analysis reports a p-value of 0.052. Which of the following best describes the meaning of this p-value?
- A. There is a 95% chance that the difference in reduction of LDL observed reflects a real difference between the two groups
- B. Though A is more effective than B, there is a 5% chance the difference in reduction of LDL between the two groups is due to chance
- C. If 100 permutations of this experiment were conducted, 5 of them would show similar results to those described above
- D. This is a statistically significant result
- E. There is a 5.2% chance of observing a difference in reduction of LDL of 11 mg/dL or greater even if the two medications have identical effects (Correct Answer)
Sample size for equivalence trials Explanation: **There is a 5.2% chance of observing a difference in reduction of LDL of 11 mg/dL or greater even if the two medications have identical effects**
- The **p-value** represents the probability of observing results as extreme as, or more extreme than, the observed data, assuming the **null hypothesis** is true (i.e., there is no true difference between the groups).
- A p-value of 0.052 means there's approximately a **5.2% chance** that the observed 11 mg/dL difference (or a more substantial difference) occurred due to **random variation**, even if both statins were equally effective.
*There is a 95% chance that the difference in reduction of LDL observed reflects a real difference between the two groups*
- This statement is an incorrect interpretation of the p-value; it confuses the p-value with the **probability that the alternative hypothesis is true**.
- A p-value does not directly tell us the probability that the observed difference is "real" or due to the intervention being studied.
*Though A is more effective than B, there is a 5% chance the difference in reduction of LDL between the two groups is due to chance*
- This statement implies that Statin A is more effective, which cannot be concluded with a p-value of 0.052 if the significance level (alpha) was set at 0.05.
- While it's true there's a chance the difference is due to chance, claiming A is "more effective" based on this p-value before statistical significance is usually declared is misleading.
*If 100 permutations of this experiment were conducted, 5 of them would show similar results to those described above*
- This is an incorrect interpretation. The p-value does not predict the outcome of repeated experiments in this manner.
- It refers to the **probability under the null hypothesis in a single experiment**, not the frequency of results across multiple hypothetical repetitions.
*This is a statistically significant result*
- A p-value of 0.052 is generally considered **not statistically significant** if the conventional alpha level (significance level) is set at 0.05 (or 5%).
- For a result to be statistically significant at alpha = 0.05, the p-value must be **less than 0.05**.
Sample size for equivalence trials US Medical PG Question 3: A researcher is trying to determine whether a newly discovered substance X can be useful in promoting wound healing after surgery. She conducts this study by enrolling the next 100 patients that will be undergoing this surgery and separating them into 2 groups. She decides which patient will be in which group by using a random number generator. Subsequently, she prepares 1 set of syringes with the novel substance X and 1 set of syringes with a saline control. Both of these sets of syringes are unlabeled and the substances inside cannot be distinguished. She gives the surgeon performing the surgery 1 of the syringes and does not inform him nor the patient which syringe was used. After the study is complete, she analyzes all the data that was collected and performs statistical analysis. This study most likely provides which level of evidence for use of substance X?
- A. Level 3
- B. Level 1 (Correct Answer)
- C. Level 4
- D. Level 5
- E. Level 2
Sample size for equivalence trials Explanation: ***Level 1***
- The study design described is a **randomized controlled trial (RCT)**, which is considered the **highest level of evidence (Level 1)** in the hierarchy of medical evidence.
- Key features like **randomization**, **control group**, and **blinding (double-blind)** help minimize bias and strengthen the validity of the findings.
*Level 2*
- Level 2 evidence typically comprises **well-designed controlled trials without randomization** (non-randomized controlled trials) or **high-quality cohort studies**.
- While strong, they do not possess the same level of internal validity as randomized controlled trials.
*Level 3*
- Level 3 evidence typically includes **case-control studies** or **cohort studies**, which are observational designs and carry a higher risk of bias compared to RCTs.
- These studies generally do not involve randomization or intervention assignment by the researchers.
*Level 4*
- Level 4 evidence is usually derived from **case series** or **poor quality cohort and case-control studies**.
- These studies provide descriptive information or investigate associations without strong control for confounding factors.
*Level 5*
- Level 5 evidence is the **lowest level of evidence**, consisting of **expert opinion** or **animal research/bench research**.
- This level lacks human clinical data or systematic investigative rigor needed for higher evidence levels.
Sample size for equivalence trials US Medical PG Question 4: A pharmaceutical company conducts a randomized clinical trial to demonstrate that their new anticoagulant drug, Aclotsaban, prevents more thrombotic events following total knee arthroplasty than the current standard of care. A significant number of patients are lost to follow-up, and many fail to complete treatment according to the study arm to which they were assigned. Despite these protocol deviations, the results for the patients who completed the course of Aclotsaban are encouraging. Which of the following analytical approaches is most appropriate for the primary analysis to establish the efficacy of Aclotsaban?
- A. Intention-to-treat analysis (Correct Answer)
- B. Sub-group analysis
- C. Per-protocol analysis
- D. As-treated analysis
- E. Non-inferiority analysis
Sample size for equivalence trials Explanation: ***Intention-to-treat analysis***
- **Intention-to-treat (ITT) analysis** is the gold standard for the **primary analysis in superiority trials** and includes all patients in the groups to which they were originally randomized, regardless of protocol deviations, loss to follow-up, or treatment discontinuation.
- ITT preserves **randomization balance**, prevents bias from selective dropout (patients may drop out due to adverse effects or lack of efficacy), and provides a **conservative, realistic estimate** of treatment effect in actual clinical practice.
- For **regulatory approval and establishing efficacy**, ITT is the most appropriate primary analysis method even when dropout rates are high, as it maintains the integrity of the randomized comparison.
*Per-protocol analysis*
- **Per-protocol analysis** includes only patients who completed the study exactly as planned without protocol deviations.
- While the encouraging results in completers are noted, per-protocol analysis can **introduce significant bias** by excluding patients who dropped out due to adverse events or lack of efficacy, potentially **overestimating treatment benefit**.
- Per-protocol is typically used as a **secondary/supportive analysis**, not the primary method for establishing superiority.
*As-treated analysis*
- **As-treated analysis** categorizes patients according to the treatment they actually received rather than their randomized assignment.
- This violates the principle of randomization and can introduce **confounding bias**, as actual treatment received may be influenced by prognostic factors.
*Sub-group analysis*
- **Sub-group analysis** evaluates treatment effects within specific patient subsets.
- This is **hypothesis-generating** rather than confirmatory and increases the risk of false-positive findings (multiple comparisons problem) unless pre-specified in the protocol.
*Non-inferiority analysis*
- **Non-inferiority analysis** tests whether a new treatment is not worse than control by more than a pre-specified margin.
- The goal here is to demonstrate **superiority** (better than standard care), not non-inferiority, making this approach inappropriate.
Sample size for equivalence trials US Medical PG Question 5: An investigator is measuring the blood calcium level in a sample of female cross country runners and a control group of sedentary females. If she would like to compare the means of the two groups, which statistical test should she use?
- A. Chi-square test
- B. Linear regression
- C. t-test (Correct Answer)
- D. ANOVA (Analysis of Variance)
- E. F-test
Sample size for equivalence trials Explanation: ***t-test***
- A **t-test** is appropriate for comparing the means of two independent groups, such as the blood calcium levels between runners and sedentary females.
- It assesses whether the observed difference between the two sample means is statistically significant or occurred by chance.
*Chi-square test*
- The **chi-square test** is used to analyze categorical data to determine if there is a significant association between two variables.
- It is not suitable for comparing continuous variables like blood calcium levels.
*Linear regression*
- **Linear regression** is used to model the relationship between a dependent variable (outcome) and one or more independent variables (predictors).
- It aims to predict the value of a variable based on the value of another, rather than comparing means between groups.
*ANOVA (Analysis of Variance)*
- **ANOVA** is used to compare the means of **three or more independent groups**.
- Since there are only two groups being compared in this scenario, a t-test is more specific and appropriate.
*F-test*
- The **F-test** is primarily used to compare the variances of two populations or to assess the overall significance of a regression model.
- While it is the basis for ANOVA, it is not the direct test for comparing the means of two groups.
Sample size for equivalence trials US Medical PG Question 6: A study is being conducted on depression using the Patient Health questionnaire (PHQ-9) survey data embedded within a popular social media network with a response size of 500,000 participants. The sample population of this study is approximately normal. The mean PHQ-9 score is 14, and the standard deviation is 4. How many participants have scores greater than 22?
- A. 175,000
- B. 17,500
- C. 160,000
- D. 12,500 (Correct Answer)
- E. 25,000
Sample size for equivalence trials Explanation: ***12,500***
- To find the number of participants with scores greater than 22, first calculate the **z-score** for a score of 22: $Z = \frac{(X - \mu)}{\sigma} = \frac{(22 - 14)}{4} = 2$.
- A z-score of 2 means the score is **2 standard deviations above the mean**. Using the **empirical rule** for a normal distribution, approximately **2.5%** of the data falls beyond 2 standard deviations above the mean (5% total in both tails, so 2.5% in each tail).
- Therefore, $2.5\%$ of the total 500,000 participants is $0.025 \times 500,000 = 12,500$.
*175,000*
- This option would imply a much larger proportion of the population scoring above 22, inconsistent with the **normal distribution's properties** and the calculated z-score.
- It would correspond to a z-score closer to 0, indicating a score closer to the mean, not two standard deviations above it.
*17,500*
- This value represents **3.5%** of the total population ($17,500 / 500,000 = 0.035$).
- A proportion of 3.5% above the mean corresponds to a z-score that is not exactly 2, indicating an incorrect calculation or interpretation of the **normal distribution table**.
*160,000*
- This option represents a very large portion of the participants, roughly **32%** of the total population.
- This percentage would correspond to scores within one standard deviation of the mean, not scores 2 standard deviations above the mean as calculated.
*25,000*
- This value represents **5%** of the total population ($25,000 / 500,000 = 0.05$).
- A z-score greater than 2 corresponds to the far tail of the normal distribution, where only 2.5% of the data lies, not 5%. This would correspond to a z-score of approximately 1.65.
Sample size for equivalence trials US Medical PG Question 7: You are currently employed as a clinical researcher working on clinical trials of a new drug to be used for the treatment of Parkinson's disease. Currently, you have already determined the safe clinical dose of the drug in a healthy patient. You are in the phase of drug development where the drug is studied in patients with the target disease to determine its efficacy. Which of the following phases is this new drug currently in?
- A. Phase 4
- B. Phase 1
- C. Phase 2 (Correct Answer)
- D. Phase 0
- E. Phase 3
Sample size for equivalence trials Explanation: ***Phase 2***
- **Phase 2 trials** involve studying the drug in patients with the target disease to assess its **efficacy** and further evaluate safety, typically involving a few hundred patients.
- The question describes a stage after safe dosing in healthy patients (Phase 1) and before large-scale efficacy confirmation (Phase 3), focusing on efficacy in the target population.
*Phase 4*
- **Phase 4 trials** occur **after a drug has been approved** and marketed, monitoring long-term effects, optimal use, and rare side effects in a diverse patient population.
- This phase is conducted post-market approval, whereas the question describes a drug still in development prior to approval.
*Phase 1*
- **Phase 1 trials** primarily focus on determining the **safety and dosage** of a new drug in a **small group of healthy volunteers** (or sometimes patients with advanced disease if the drug is highly toxic).
- The question states that the safe clinical dose in a healthy patient has already been determined, indicating that Phase 1 has been completed.
*Phase 0*
- **Phase 0 trials** are exploratory, very early-stage studies designed to confirm that the drug reaches the target and acts as intended, typically involving a very small number of doses and participants.
- These trials are conducted much earlier in the development process, preceding the determination of safe clinical doses and large-scale efficacy studies.
*Phase 3*
- **Phase 3 trials** are large-scale studies involving hundreds to thousands of patients to confirm **efficacy**, monitor side effects, compare it to commonly used treatments, and collect information that will allow the drug to be used safely.
- While Phase 3 does assess efficacy, it follows Phase 2 and is typically conducted on a much larger scale before submitting for regulatory approval.
Sample size for equivalence trials US Medical PG Question 8: In a randomized controlled trial studying a new treatment, the primary endpoint (mortality) occurred in 14.4% of the treatment group and 16.7% of the control group. Which of the following represents the number of patients needed to treat to save one life, based on the primary endpoint?
- A. 1/(0.144 - 0.167)
- B. 1/(0.167 - 0.144) (Correct Answer)
- C. 1/(0.300 - 0.267)
- D. 1/(0.267 - 0.300)
- E. 1/(0.136 - 0.118)
Sample size for equivalence trials Explanation: ***1/(0.167 - 0.144)***
- The **Number Needed to Treat (NNT)** is calculated as **1 / Absolute Risk Reduction (ARR)**.
- The **Absolute Risk Reduction (ARR)** is the difference between the event rate in the control group (16.7%) and the event rate in the treatment group (14.4%), which is **0.167 - 0.144**.
*1/(0.144 - 0.167)*
- This calculation represents 1 divided by the **Absolute Risk Increase**, which would be relevant if the treatment increased mortality.
- The **NNT should always be a positive value**, indicating the number of patients to treat to prevent one adverse event.
*1/(0.300 - 0.267)*
- This option uses arbitrary numbers (0.300 and 0.267) that do not correspond to the given **mortality rates** in the problem.
- It does not reflect the correct calculation for **absolute risk reduction** based on the provided data.
*1/(0.267 - 0.300)*
- This option also uses arbitrary numbers not derived from the problem's data, and it would result in a **negative value** for the denominator.
- The difference between event rates of 0.267 and 0.300 is not present in the given information for this study.
*1/(0.136 - 0.118)*
- This calculation uses arbitrary numbers (0.136 and 0.118) that are not consistent with the reported **mortality rates** of 14.4% and 16.7%.
- These values do not represent the **Absolute Risk Reduction** required for calculating NNT in this specific scenario.
Sample size for equivalence trials US Medical PG Question 9: A health system implements a new sepsis protocol across 20 hospitals. A researcher plans to evaluate effectiveness using a stepped-wedge cluster randomized design where hospitals sequentially adopt the protocol every 3 months. She calculates sample size based on individual patient outcomes (mortality) needing 2,000 patients total. The biostatistician identifies a critical error. Evaluate what modification is needed.
- A. Adjust for multiple time periods using Bonferroni correction
- B. Use hospital-level outcomes instead of patient-level outcomes as unit of analysis
- C. Increase alpha to 0.10 to account for cluster randomization reducing power
- D. Include random effects for both hospital and time period in power calculation
- E. Account for intra-cluster correlation coefficient (ICC) requiring substantial sample size inflation (Correct Answer)
Sample size for equivalence trials Explanation: ***Account for intra-cluster correlation coefficient (ICC) requiring substantial sample size inflation***
- In cluster-randomized designs, observations within the same cluster (hospital) are not independent; the **Intra-cluster Correlation Coefficient (ICC)** quantifies this correlation and must be used to calculate a **design effect**.
- Neglecting the ICC leads to an **underpowered study** because the effective sample size is smaller than the total number of individual patients measured.
*Adjust for multiple time periods using Bonferroni correction*
- **Bonferroni correction** is used to control for **Type I error** when performing multiple independent hypothesis tests, not for determining sample size in nested longitudinal designs.
- While the stepped-wedge design involves multiple time points, the primary analysis typically uses a **single model** (e.g., GEE or GLMM) that accounts for time as a fixed effect.
*Use hospital-level outcomes instead of patient-level outcomes as unit of analysis*
- While the hospital is the **unit of randomization**, using hospital-level means as the unit of analysis simplifies the data and causes a significant loss of **statistical information** and precision.
- Modern biostatistical methods utilize **multilevel modeling** to maintain the richness of patient-level data while adjusting for the cluster-level randomization.
*Include random effects for both hospital and time period in power calculation*
- While random effects are important for the **analysis phase**, the "critical error" identified in the prompt refers to the initial failure to inflate the sample size based on **clustering (ICC)**.
- Power calculations for stepped-wedge designs are complex and certainly involve time parameters, but **ICC-based inflation** is the most fundamental adjustment required when moving from individual to cluster randomization.
*Increase alpha to 0.10 to account for cluster randomization reducing power*
- Increasing the **alpha level** (significance threshold) is not a standard or scientifically acceptable method to compensate for the loss of power due to **clustering**.
- Standard practice mandates maintaining an **alpha of 0.05** while appropriately increasing the **sample size** or number of clusters to reach the desired power (usually 80-90%).
Sample size for equivalence trials US Medical PG Question 10: A 41-year-old research fellow designs a non-inferiority trial comparing oral to IV antibiotics for osteomyelitis. She sets the non-inferiority margin at 10% (cure rate difference), expects 85% cure in both groups, and calculates 300 patients per arm for 80% power with α=0.025 (one-sided). Her mentor suggests this underestimates required sample size. Evaluate the mentor's concern.
- A. Correct; non-inferiority trials require larger samples than superiority trials for equivalent power (Correct Answer)
- B. Incorrect; non-inferiority trials actually require smaller samples due to less stringent hypotheses
- C. Correct; dropout rates in antibiotic trials necessitate 20% inflation of calculated sample size
- D. Incorrect; the calculation appropriately uses one-sided alpha for non-inferiority testing
- E. Correct; the margin should be set at 5% requiring doubling of sample size
Sample size for equivalence trials Explanation: ***Correct; non-inferiority trials require larger samples than superiority trials for equivalent power***
- **Non-inferiority trials** are designed to exclude a difference greater than a pre-specified margin, which typically requires a **larger sample size** than superiority trials investigating the same outcome.
- Because we are proving that the new treatment is "not much worse" (rather than "better"), the **statistical threshold** often necessitates higher enrollment to achieve adequate **power**.
*Incorrect; the calculation appropriately uses one-sided alpha for non-inferiority testing*
- While it is true that **non-inferiority testing** uses a **one-sided alpha (0.025)**, this does not negate the fact that such trials inherently require more participants.
- The mentor's concern is about the **total N**, which remains insufficient despite using the correct one-sided alpha convention.
*Correct; the margin should be set at 5% requiring doubling of sample size*
- There is no universal rule that the **non-inferiority margin** must be 5%; it is determined by **clinical significance** and regulatory standards for the specific condition.
- While a 5% margin would indeed increase the sample size, the 10% margin is often standard in **antibiotic trials** for osteomyelitis.
*Incorrect; non-inferiority trials actually require smaller samples due to less stringent hypotheses*
- This is a common misconception; non-inferiority trials are actually more demanding because the **null hypothesis** assumes the treatments are different (inferior).
- Disproving **inferiority** within a tight **margin (delta)** is statistically more intensive than proving a treatment is superior to a placebo.
*Correct; dropout rates in antibiotic trials necessitate 20% inflation of calculated sample size*
- While **attrition bias** is a concern, there is no fixed rule that every trial needs a **20% inflation** factor.
- The mentor's concern is specifically about the **base calculation** and the statistical nature of non-inferiority designs rather than just the **dropout rate**.
More Sample size for equivalence trials US Medical PG questions available in the OnCourse app. Practice MCQs, flashcards, and get detailed explanations.