Biostatistics Practice Questions

Q: Child Pugh score puts patients into three categories of Score A (<7), Score B (7-9) and Score C (10-15). The following is a type of which scale?

Ordinal. ***Ordinal*** - An **ordinal scale** ranks data with a meaningful order, like the Child-Pugh score categories (A, B, C), but the difference between ranks isn't necessarily equal or precisely quantifiable. - While categories reflect increasing severity, the "distance" between A and B may not be the same as between B and C in a strictly numerical sense. *Nominal* - A **nominal scale** categorizes data without any order or ranking, such as blood types (A, B, AB, O) or gender. - The Child-Pugh score categories have an inherent order of severity (A < B < C), making it more than just nominal. *Quantitative* - **Quantitative scales** involve numerical data that can be measured and calculated. - While the Child-Pugh score is derived from quantitative variables, the final categorical output (A, B, C) doesn't allow for arithmetic operations between the categories themselves. *Continuous* - A **continuous scale** can take any value within a given range, like height or weight, allowing for infinite precision. - The Child-Pugh score categories (A, B, C) are discrete and specific ranges, not allowing for infinitely precise intermediate values.

Q: Most appropriate measure for central tendency when data includes extreme values?

Median. ***Median*** - The **median** is less affected by **extreme values** or **outliers** because it represents the middle value in an ordered dataset. - It provides a more robust measure of central tendency when the data distribution is **skewed**. *Mode* - The **mode** represents the most frequently occurring value in a dataset; it does not account for the magnitude of other values. - While it is not influenced by extreme values, it may not accurately represent the central tendency of a continuous dataset, especially if there are **multiple modes** or if the most frequent value is not central. *Mean* - The **mean** is calculated by summing all values and dividing by the number of values, making it highly susceptible to **extreme values** or **outliers**. - A single very large or very small value can significantly distort the mean, pulling it away from the true center of most data points. *Geometric mean* - The **geometric mean** is primarily used for data that is **multiplicative** in nature or when dealing with rates of change, or positively skewed distributions. - While it can be less sensitive to extreme values than the arithmetic mean for certain types of data, it is not the most appropriate general measure for central tendency when outliers are present without specific multiplicative contexts.

Q: In a vaccine trial, relative risk is 0.2. What is the vaccine efficacy?

80%. ***80%*** - Vaccine efficacy is calculated as **(1 - Relative Risk) x 100%**. Given a relative risk of 0.2, the efficacy is (1 - 0.2) x 100% = **80%**. - This value represents the **proportionate reduction** in disease incidence in the vaccinated group compared to an unvaccinated group. *90%* - This would imply a relative risk of 0.1, as **(1 - 0.1) x 100% = 90%**. - The given relative risk of **0.2** does not correspond to 90% efficacy. *95%* - This would imply a relative risk of 0.05, as **(1 - 0.05) x 100% = 95%**. - The given relative risk of **0.2** does not correspond to 95% efficacy. *20%* - This value directly represents the **Relative Risk (RR)** itself, or an efficacy calculated incorrectly as RR x 100%. - Vaccine efficacy is a measure of reduction from the unvaccinated state, hence it is **1 - RR**.

Q: Hardy-Weinberg equilibrium shows gene frequency p=0.7, q=0.3. What is frequency of heterozygotes?

0.42. ***0.42*** - In **Hardy-Weinberg equilibrium**, the frequency of heterozygotes is given by the formula **2pq**. - Given **p = 0.7** and **q = 0.3**, the frequency of heterozygotes is 2 * 0.7 * 0.3 = **0.42**. *0.09* - This value represents **q²**, which is the frequency of the **homozygous recessive genotype** (0.3 * 0.3 = 0.09). - It does not represent the frequency of heterozygous individuals. *0.49* - This value represents **p²**, which is the frequency of the **homozygous dominant genotype** (0.7 * 0.7 = 0.49). - It does not represent the frequency of heterozygous individuals. *0.21* - This value represents only **pq** (0.7 * 0.3 = 0.21), not the full frequency of heterozygotes which is **2pq**. - The coefficient of 2 is necessary because there are two ways to be heterozygous (one allele from each parent).

Q: Calculate the sensitivity of a screening test: True Positives=80, False Negatives=20, True Negatives=90, False Positives=10

80%. ***80%*** - Sensitivity is calculated as **True Positives / (True Positives + False Negatives)**. In this case, 80 / (80 + 20) = 80/100, which equals 0.8 or 80%. - This metric represents the proportion of **actual positive cases** that are correctly identified by the test. *90%* - This value might represent the **specificity** (True Negatives / (True Negatives + False Positives)) if calculated with the given numbers (90 / (90 + 10) = 90%). - However, the question specifically asks for **sensitivity**, which is a different measure. *85%* - This percentage would be obtained if the total number of true positives and false negatives was 94 (e.g., 80 / 94), which is not the case here. - It does not correspond to the correct formula for **sensitivity** using the provided data. *95%* - This result would occur if the test correctly identified 95 out of 100 actual positive cases (e.g., 95 TP and 5 FN). - The given data of **80 True Positives** and **20 False Negatives** leads to a lower sensitivity.

Q: What is the correlation coefficient if regression coefficient of X on Y is 0.8 and Y on X is 0.9?

0.85. ***Correct: 0.85*** - The correlation coefficient (r) is the **geometric mean** of the two regression coefficients - Formula: r = √(b_xy × b_yx), where b_xy is the regression coefficient of X on Y and b_yx is the regression coefficient of Y on X - Calculation: r = √(0.8 × 0.9) = √0.72 ≈ **0.8485**, which rounds to **0.85** - Since both regression coefficients are positive, the correlation is positive *Incorrect: 0.95* - This would be obtained by taking the **arithmetic mean** [(0.8 + 0.9)/2 = 0.85... wait, that's not 0.95] - Actually, this value is too high and doesn't result from any standard calculation with these regression coefficients - The correct method requires the **geometric mean** (square root of the product), not any simple average *Incorrect: 0.81* - This appears to be the square of one regression coefficient (0.9² = 0.81) - However, the correlation coefficient requires the **square root of the product** of both coefficients, not squaring a single coefficient - This is a common error in calculation *Incorrect: 0.72* - This is the **product** of the two regression coefficients (0.8 × 0.9 = 0.72) - This is an intermediate step in the calculation, but not the final answer - The correlation coefficient requires taking the **square root** of this product: √0.72 ≈ 0.85

Q: Which method is most accurate for estimating the incidence of a disease?

Cohort study. ***Cohort study*** - A **cohort study** tracks a group of individuals over time to observe the development of new cases of a disease, allowing for direct calculation of **incidence rates**. - It starts with a healthy population and identifies who develops the disease, providing the most accurate measure of **risk** and incidence. *Case-control study* - **Case-control studies** are primarily used to investigate **risk factors** for a disease by comparing exposures between individuals with the disease (cases) and those without (controls). - They **cannot directly estimate incidence** because they are retrospective and select participants based on disease status. *Cross-sectional study* - A **cross-sectional study** assesses the prevalence of a disease and/or exposure at a single point in time. - It provides a snapshot of the population's health status but **cannot determine incidence** as it doesn't observe new cases developing over time. *Ecological study* - An **ecological study** examines disease rates and exposures across populations rather than individuals. - While useful for generating hypotheses, it is prone to the **ecological fallacy** and cannot determine individual-level incidence.

Q: In the context of diagnostic testing, how does using a series testing approach affect the specificity and sensitivity of the test?

Increases specificity and decreases sensitivity. ***Increases specificity and decreases sensitivity*** - A **series testing approach** means that for a diagnosis to be made, **all tests in the sequence must be positive**. This approach is designed to reduce false positives because a patient must pass multiple hurdles. - By requiring multiple positive results, the likelihood of a false positive for the overall diagnosis decreases, thereby **increasing the specificity** of the diagnostic process. Conversely, this stricter criterion means some true positives might be missed (if one test in the series is negative, even if the patient has the disease), leading to a **decrease in sensitivity**. *Increases sensitivity and decreases specificity* - This outcome is characteristic of a **parallel testing approach**, where a positive result on **any one of several tests** is sufficient for a diagnosis. - While parallel testing increases the chance of catching true positives (higher sensitivity), it also raises the risk of false positives (lower specificity) because fewer criteria need to be met. *Both increase* - It is generally **not possible** to increase both sensitivity and specificity simultaneously through a simple change in testing strategy without altering the intrinsic properties of the tests themselves. - There is typically an **inverse relationship** between sensitivity and specificity; improving one often comes at the expense of the other. *Both decrease* - A decrease in both sensitivity and specificity would indicate a **poorly designed or executed testing strategy**, or using tests that are individually unreliable. - This outcome would be undesirable as it would lead to both a high rate of missed diagnoses and a high rate of false positive diagnoses.

Q: A study is conducted to assess the relationship between smoking and lung cancer by following a group of smokers and non-smokers over time to calculate incidence rates. What type of study design is being described?

Cohort study. ***Cohort study*** - A **cohort study** follows a group of individuals over time based on their exposure status (smokers vs. non-smokers) to see who develops the outcome (lung cancer). - This design allows for the calculation of **incidence rates** and **relative risk**, providing strong evidence for causality and temporal relationships. - Cohort studies are considered the **gold standard** for observational studies as they establish that exposure precedes disease. *Case-control study* - A **case-control study** starts with individuals who already have the outcome (lung cancer cases) and compares their past exposure (smoking) to a control group without the outcome. - While efficient for rare diseases and diseases with long latency periods, the question specifically describes **following subjects over time**, which is characteristic of a cohort design, not case-control. *Cross-sectional study* - A **cross-sectional study** assesses exposure and outcome simultaneously at a single point in time, providing a snapshot of prevalence. - It cannot establish a **temporal relationship** between smoking and lung cancer, and does not involve following subjects over time. *Randomized controlled trial* - A **randomized controlled trial (RCT)** involves randomly assigning participants to an intervention or control group and is best for evaluating the effectiveness of treatments or preventive interventions. - It would be **unethical** to randomize individuals to smoke to assess lung cancer risk, making this design inappropriate for studying harmful exposures.

Q: A researcher is conducting a study to compare two treatment modalities for hypertension. The study finds a p-value of 0.02 and a 95% confidence interval for the difference in means that does not include zero. How should these results be interpreted?

The results are statistically significant, indicating that the null hypothesis can be rejected.. ***The results are statistically significant, indicating that the null hypothesis can be rejected.*** - A **p-value of 0.02** is less than the conventional significance level of 0.05, meaning the observed difference is unlikely due to **random chance**. - A **95% confidence interval** for the difference in means that does not include **zero** further reinforces that there is a statistically significant difference between the two treatments. *There is no significant difference between the two treatments.* - A **p-value of 0.02** indicates a statistically significant difference, not an absence of difference. - The **confidence interval not including zero** explicitly shows a significant difference between the treatment effects. *The p-value indicates that the results are not significant.* - A **p-value of 0.02** is typically considered **statistically significant** within a 95% confidence threshold (alpha = 0.05). - A p-value **less than 0.05** allows for the rejection of the null hypothesis. *The confidence interval suggests that the study has low power.* - The **width of the confidence interval** is related to the precision of the estimate and sample size, but not directly to the statistical significance or power in this context. - A **confidence interval that does not include zero** indicates a significant finding, which typically implies adequate power to detect that specific difference, not low power.

Question 1

Child Pugh score puts patients into three categories of Score A (<7), Score B (7-9) and Score C (10-15). The following is a type of which scale?

Accepted Answer

Ordinal

Answer

Nominal

Answer

Quantitative

Answer

Continuous

Question 2

Most appropriate measure for central tendency when data includes extreme values?

Accepted Answer

Median

Answer

Mode

Answer

Mean

Answer

Geometric mean

Question 3

In a vaccine trial, relative risk is 0.2. What is the vaccine efficacy?

Accepted Answer

80%

Answer

90%

Answer

95%

Answer

20%

Question 4

Hardy-Weinberg equilibrium shows gene frequency p=0.7, q=0.3. What is frequency of heterozygotes?

Accepted Answer

0.42

Answer

0.09

Answer

0.49

Answer

0.21

Question 5

Calculate the sensitivity of a screening test: True Positives=80, False Negatives=20, True Negatives=90, False Positives=10

Accepted Answer

80%

Answer

90%

Answer

85%

Answer

95%

Question 6

What is the correlation coefficient if regression coefficient of X on Y is 0.8 and Y on X is 0.9?

Accepted Answer

0.85

Answer

0.95

Answer

0.81

Answer

0.72

Question 7

Which method is most accurate for estimating the incidence of a disease?

Accepted Answer

Cohort study

Answer

Case-control study

Answer

Cross-sectional study

Answer

Ecological study

Question 8

In the context of diagnostic testing, how does using a series testing approach affect the specificity and sensitivity of the test?

Accepted Answer

Increases specificity and decreases sensitivity

Answer

Increases sensitivity and decreases specificity

Answer

Both increase

Answer

Both decrease

Question 9

A study is conducted to assess the relationship between smoking and lung cancer by following a group of smokers and non-smokers over time to calculate incidence rates. What type of study design is being described?

Accepted Answer

Cohort study

Answer

Case-control study

Answer

Cross-sectional study

Answer

Randomized controlled trial

Question 10

A researcher is conducting a study to compare two treatment modalities for hypertension. The study finds a p-value of 0.02 and a 95% confidence interval for the difference in means that does not include zero. How should these results be interpreted?

Accepted Answer

The results are statistically significant, indicating that the null hypothesis can be rejected.

Answer

There is no significant difference between the two treatments.

Answer

The p-value indicates that the results are not significant.

Answer

The confidence interval suggests that the study has low power.

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?