Biostatistics Practice Questions

Q: The denominator of the positive predictive value is

True positive + False positive. ***True positive + False positive*** - The **positive predictive value** (PPV) is defined as the probability that subjects with a **positive screening test** truly have the disease. - The formula for PPV is **True Positives / (True Positives + False Positives)**; thus, the denominator includes all positive test results. *False positive + True negative* - This combination of values describes the denominator for the **false positive rate** (False Positives / (False Positives + True Negatives)), which is not related to the PPV. - **True negatives** are correctly identified as not having the disease, which is irrelevant for the calculation of PPV. *False positive + false negative* - This sum does not directly represent any standard epidemiological measure or denominator for common test performance metrics like sensitivity, specificity, or PPV. - Both **False positives** and **False negatives** represent incorrect test outcomes. *True positive + False negative* - This represents the total number of individuals who **actually have the disease** (True Positives / (True Positives + False Negatives)) and is the denominator for **sensitivity**. - **False negatives** are individuals with the disease who tested negative, which are not relevant for the denominator of PPV.

Q: Which of the following best reflects the diagnostic power of a test?

Sensitivity and specificity. ***Sensitivity and specificity*** - **Diagnostic power of a test** refers to its intrinsic ability to correctly identify individuals with and without disease, which is best reflected by **sensitivity and specificity**. - **Sensitivity** (true positive rate) measures the test's power to detect disease when present - the ability to correctly identify diseased individuals. - **Specificity** (true negative rate) measures the test's power to rule out disease when absent - the ability to correctly identify non-diseased individuals. - These are **inherent properties of the test** that remain constant regardless of disease prevalence in the population, making them the true measures of diagnostic power. - Together, they define how well a test can discriminate between diseased and non-diseased states. *Predictive value of a test* - **Predictive values** (positive and negative) indicate the probability of disease given a test result, but they are measures of **clinical utility**, not diagnostic power. - Predictive values are **dependent on disease prevalence** - the same test with identical sensitivity and specificity will have different predictive values in populations with different disease prevalence. - They answer "Given this result, what is the probability of disease?" rather than measuring the test's inherent diagnostic ability. *Specificity alone* - **Specificity alone** is incomplete as it only measures the test's ability to identify non-diseased individuals. - Diagnostic power requires assessment of both the ability to detect disease (sensitivity) and to rule it out (specificity). *Population attributable risk of a test* - **Population attributable risk (PAR)** is an epidemiological measure that quantifies the proportion of disease in a population attributable to a specific risk factor. - It is not a measure of diagnostic test performance and is unrelated to diagnostic power.

Q: Which type of measurement scale is used to rank data without precise intervals, such as satisfaction levels?

Ordinal. ***Ordinal*** - An **ordinal scale** allows for the ranking of data into a meaningful order, such as "low," "medium," or "high" satisfaction, but does not provide information about the **precise differences** between these ranks. - While we know that "high" is better than "medium," we cannot quantify by how much, making it suitable for representing **satisfaction levels** and similar qualitative judgments. *Nominal* - A **nominal scale** categorizes data without any order or ranking, such as gender or blood type. - It only provides labels for different categories and does not imply any quantitative or logical relationship between them. *Interval* - An **interval scale** measures data with ordered categories and **equal, meaningful intervals** between them, but it lacks a true zero point. - Examples include temperature in Celsius or Fahrenheit, where the difference between 20°C and 30°C is the same as between 30°C and 40°C, but 0°C does not mean an absence of temperature. *Ratio* - A **ratio scale** is the most informative measurement scale, possessing all the properties of an interval scale while also including a **true and meaningful zero point**. - This allows for calculations of ratios and proportions; examples include weight, height, or income, where zero truly represents the absence of the measured quantity.

Q: When continuous observations are made before and after exposure to a factor in the same subjects, which statistical test is most appropriate to analyze the data?

Paired T-test. ***Paired T-test*** - This test is specifically designed for comparing **means from two related samples**, such as measurements taken from the same subjects before and after an intervention. - It accounts for the **dependent nature** of the observations, making it suitable for within-subject comparisons. - When the question states "continuous observations" without mentioning non-normal distribution, the **paired t-test is the standard choice** as it assumes normally distributed differences. *Chi-square test* - The **chi-square test** is used for analyzing **categorical data** to determine if there is a significant association between two variables. - It is not appropriate for comparing continuous measurements from before and after an intervention in the same subjects. *Unpaired T-test* - The **unpaired t-test** is used to compare the **means of two independent groups**, where the observations in one group are unrelated to the observations in the other. - It is unsuitable for this scenario as the data comes from the same subjects, making the samples dependent. *Wilcoxon signed-rank test* - The **Wilcoxon signed-rank test** is the **non-parametric alternative** to the paired t-test, used when the data does not meet the assumptions for a paired t-test (e.g., non-normally distributed data or ordinal data). - While it can handle paired continuous data, it would only be preferred if parametric assumptions are violated. Since the question does not indicate such violations, the paired t-test is the **most appropriate** choice as the first-line parametric test.

Q: An investigator concluded that the presence or absence of five factors determines the disease condition. Which of the following would be the most appropriate next study to determine if any of these five factors are independent precursors of the disease?

Multiple logistic regression analysis. ***Multiple logistic regression analysis*** - This method is appropriate when the **outcome variable** (disease condition) is **dichotomous** (present or absent) - It allows assessment of the **independent effect of each factor** while **controlling for other factors**, helping to identify true independent precursors - Gold standard for modeling **multiple predictors** with a **binary outcome** *Multiple linear regression analysis* - This analysis is used when the **outcome variable is continuous**, not dichotomous like the presence or absence of a disease - Would not be suitable for modeling a binary outcome (disease present/absent) *Analysis of variance (ANOVA)* - ANOVA is primarily used to compare the **means of three or more groups** on a continuous outcome variable - Not designed to assess multiple independent factors influencing a binary outcome - Used for comparing groups, not for modeling predictors of disease *Kruskal-Wallis Analysis of ranks* - This is a **non-parametric test** used for comparing **three or more independent groups** on an ordinal or continuous variable - Similar to ANOVA but for non-normally distributed data - Not suitable for modeling the independent effect of multiple factors on a binary outcome

Q: What does the chi-square test measure in the context of categorical variables?

Association between categorical variables. ***Association between categorical variables*** - The **chi-square test** is used to determine if there is a statistically significant association between two or more **categorical variables**. - It compares the **observed frequencies** in categories with the **expected frequencies** if there were no association. *Causal relationships between variables* - The chi-square test can demonstrate an association, but it **cannot establish causation**; causality requires fulfilling specific criteria beyond mere statistical association (e.g., temporal precedence, dose-response relationship, biological plausibility). - Inferring causation from an association would be a **logical fallacy**, as confounding factors may explain observed relationships. *Correlation between categorical variables* - **Correlation** typically refers to the strength and direction of a linear relationship between **continuous variables**. - While related to association, the term "correlation" is usually reserved for **quantitative data**, whereas chi-square is designed for nominal or ordinal categorical data. *Agreement between categorical observations* - **Agreement** between observations, especially from different observers, is assessed using statistics like **Cohen's Kappa**, which measures consistency beyond chance. - The chi-square test focuses on whether two distinct categorical variables are related, not on the concordance of independent ratings.

Q: What is the formula for Pearson's measure of skewness?

(Mean - Mode) / SD. ***(Mean - Mode) / SD*** - This is the correct formula for **Pearson's first coefficient of skewness**. It measures the degree and direction of **skewness** in a distribution. - A positive value indicates a **positively skewed distribution** (tail to the right), while a negative value indicates a **negatively skewed distribution** (tail to the left). *(Mode - Mean) / SD* - This formula would yield the **negative** of Pearson's first coefficient of skewness, incorrectly representing the direction of skewness. - Skewness is generally defined in terms of how the tail extends, which is reflected by the mean's position relative to the mode. *SD / (Mode - Mean)* - This formula incorrectly places the **standard deviation** in the numerator and reverses the subtraction in the denominator. - It would not provide a meaningful measure of skewness as it does not follow the established statistical definitions. *(Median - Mean) / SD* - This formula is incomplete and not a standard measure of skewness on its own. - **Pearson's second coefficient of skewness** is actually **3(Mean - Median) / SD**, which uses a coefficient of 3 and is used when the mode is ill-defined or the data distribution has multiple modes. - The question asks for Pearson's measure of skewness generally, and the first coefficient (using the mode) is the more common and direct definition when a mode exists.

Q: What is the death rate among cholera-affected individuals in a population of 5000, where 50 people are affected by cholera, and 10 of these individuals have died?

20 per 100. ***20 per 100*** - The death rate among cholera-affected individuals is also known as the **case fatality rate (CFR)**. - This is calculated as (number of deaths / number of *affected* individuals) × 100 = (10 / 50) × 100 = **20% (or 20 per 100)**. - CFR measures the severity of disease among those who contract it. *1 per 1000* - This would represent a case fatality rate of 0.1%, which is far lower than the actual rate. - This is an incorrect calculation that doesn't match the given data. *5 per 1000* - This would represent a case fatality rate of 0.5%, which is also incorrect. - This calculation does not reflect the proportion of deaths among cholera-affected individuals. *10 per 1000* - This appears to confuse the number of deaths (10) with a rate expression. - The actual **mortality rate** (deaths per total population) would be (10 / 5000) × 1000 = **2 per 1000**, not 10 per 1000. - The question specifically asks for death rate among *affected* individuals (CFR), not the population mortality rate.

Q: The best method to show the association between height and weight of children in a class is by:

Scatter diagram. ***Scatter diagram*** - A **scatter plot** is the most appropriate method to visualize the relationship or **association** between two continuous variables, such as height and weight. - Each point on the graph represents a child's height (x-axis) and weight (y-axis), allowing for the observation of **trends** and **correlation**. *Bar chart* - Bar charts are predominantly used for comparing **categorical data** or discrete values, not for showing the relationship between two continuous variables. - They display the frequency or value of different categories, which is not suitable for visualizing a **correlation** between height and weight. *Line diagram* - Line diagrams are primarily used to show **trends over time** or sequences, where data points are connected by lines. - They are not ideal for illustrating the association between two independent continuous variables at a single point in time. *Histogram* - A histogram is used to represent the **distribution of a single continuous variable**, showing its frequency within defined ranges or "bins." - It does not allow for the display or analysis of the **relationship between two different variables** simultaneously.

Question 1

The denominator of the positive predictive value is

Accepted Answer

True positive + False positive

Answer

False positive + True negative

Answer

False positive + False negative

Answer

True positive + False negative

Question 2

What does the P-value represent in hypothesis testing?

Accepted Answer

The probability of obtaining results as extreme or more extreme than observed, assuming the null hypothesis is true.

Answer

The probability of not rejecting the null hypothesis when it is true.

Answer

The probability of rejecting the null hypothesis when it is false.

Answer

The probability of observing the data given that the null hypothesis is false.

Question 3

Which of the following best reflects the diagnostic power of a test?

Accepted Answer

Sensitivity and specificity

Answer

Specificity alone

Answer

Population attributable risk of a test

Answer

Predictive value of a test

Question 4

Which type of measurement scale is used to rank data without precise intervals, such as satisfaction levels?

Accepted Answer

Ordinal

Answer

Nominal

Answer

Interval

Answer

Ratio

Question 5

When continuous observations are made before and after exposure to a factor in the same subjects, which statistical test is most appropriate to analyze the data?

Accepted Answer

Paired T-test

Answer

Chi-square test

Answer

Unpaired T-test

Answer

Wilcoxon signed-rank test

Question 6

An investigator concluded that the presence or absence of five factors determines the disease condition. Which of the following would be the most appropriate next study to determine if any of these five factors are independent precursors of the disease?

Accepted Answer

Multiple logistic regression analysis

Answer

Multiple linear regression analysis

Answer

Analysis of variance (ANOVA)

Answer

Kruskal-Wallis Analysis of ranks

Question 7

What does the chi-square test measure in the context of categorical variables?

Accepted Answer

Association between categorical variables

Answer

Causal relationships between variables

Answer

Correlation between categorical variables

Answer

Agreement between categorical observations

Question 8

What is the formula for Pearson's measure of skewness?

Accepted Answer

(Mean - Mode) / SD

Answer

(Mode - Mean) / SD

Answer

SD / (Mode - Mean)

Answer

(Median - Mean) / SD

Question 9

What is the death rate among cholera-affected individuals in a population of 5000, where 50 people are affected by cholera, and 10 of these individuals have died?

Accepted Answer

20 per 100

Answer

10 per 1000

Answer

1 per 1000

Answer

5 per 1000

Question 10

The best method to show the association between height and weight of children in a class is by:

Accepted Answer

Scatter diagram

Answer

Line diagram

Answer

Histogram

Answer

Bar chart

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?