Biostatistics Practice Questions

Q: Specificity of a diagnostic test refers to its ability to correctly identify individuals without the disease. Which of the following options does NOT represent a component or characteristic of specificity?

A True Positive result. **Explanation** **1. Why "A True Positive result" is the correct answer:** Specificity is defined as the ability of a test to correctly identify those **without** the disease (True Negatives). It is calculated as: $$\text{Specificity} = \frac{\text{True Negatives (TN)}}{\text{True Negatives (TN)} + \text{False Positives (FP)}} \times 100$$ A **True Positive** result is a component of **Sensitivity**, not specificity. Sensitivity measures the ability of a test to identify those who actually have the disease. Therefore, True Positives are irrelevant to the calculation or definition of specificity. **2. Analysis of Incorrect Options:** * **Option A & C:** These are the core definitions of specificity. Specificity specifically looks at the "healthy" population and ensures that the test correctly labels them as "Negative" (True Negatives). * **Option D:** While 100% specificity is rarely achieved in practice, it is the "ideal" goal for a confirmatory test. High specificity ensures there are zero False Positives, meaning anyone who tests positive definitely has the disease. **3. High-Yield Clinical Pearls for NEET-PG:** * **SNOUT vs. SPIN:** * **S**e**N**sitivity rules **OUT** (used for screening; high sensitivity means a negative result reliably rules out disease). * **S**p**P**ecificity rules **IN** (used for confirmation; high specificity means a positive result reliably rules in disease). * **Screening vs. Diagnostic:** Screening tests require high sensitivity (to not miss cases), while diagnostic/confirmatory tests require high specificity (to avoid unnecessary treatment). * **False Positives:** Specificity is inversely related to the False Positive rate. If specificity is 90%, the False Positive rate is 10% ($1 - \text{Specificity}$).

Q: The 'Design Effect' is associated with which of the following sampling techniques?

Cluster sampling. ### Explanation **Why Cluster Sampling is Correct:** The **Design Effect (Deff)** is a correction factor used to account for the loss of statistical efficiency when using **Cluster Sampling** instead of Simple Random Sampling (SRS). In cluster sampling, individuals within a cluster (e.g., a village or household) tend to be more similar to each other than to individuals in other clusters (intra-cluster correlation). This "homogeneity" reduces the amount of unique information collected, leading to a larger standard error. To compensate for this, the sample size calculated for SRS must be multiplied by the Design Effect to achieve the same power. For example, in WHO’s EPI cluster surveys for immunization, a default Design Effect of **2** is often used. **Why Other Options are Incorrect:** * **A. Stratified Sampling:** This technique usually *increases* precision by dividing the population into homogenous subgroups. The design effect is typically < 1, meaning a smaller sample size might suffice compared to SRS. * **B. Systemic Sampling:** This involves selecting subjects at fixed intervals (e.g., every 10th person). While it is a type of probability sampling, it does not inherently require a design effect correction unless clustering occurs. * **D. Simple Random Sampling (SRS):** This is the "gold standard" or baseline for comparison. By definition, the Design Effect for SRS is **1.0**. **High-Yield Pearls for NEET-PG:** * **Formula:** $Deff = \frac{\text{Variance of Cluster Sample}}{\text{Variance of Simple Random Sample}}$. * **Sample Size Calculation:** Total Sample Size = $n (SRS) \times \text{Design Effect}$. * **Cluster Sampling** is the most common method used in field health surveys (e.g., NFHS, Vaccination coverage) because it is logistically easier and more cost-effective than SRS. * **Key Concept:** As intra-cluster correlation increases, the Design Effect increases.

Q: What determines the diagnostic accuracy of a test?

Predictive value. **Explanation:** In clinical practice, the **Predictive Value** (Positive and Negative) is the most important measure of a test’s diagnostic accuracy because it determines the probability that the test result reflects the true disease status of a specific patient. While sensitivity and specificity are inherent properties of the test itself, predictive values tell a clinician how "accurate" a result is when applied to a population with a specific disease prevalence. **Why other options are incorrect:** * **Sensitivity (A):** This measures the ability of a test to correctly identify those *with* the disease (True Positive Rate). It is used for screening but does not account for false positives. * **Specificity (B):** This measures the ability of a test to correctly identify those *without* the disease (True Negative Rate). It is used for confirmation but does not account for false negatives. * **Odds Ratio (D):** This is a measure of association used primarily in Case-Control studies to quantify the relationship between an exposure and an outcome; it is not a measure of diagnostic test accuracy. **High-Yield Clinical Pearls for NEET-PG:** 1. **Prevalence Dependency:** Predictive values are heavily influenced by the **prevalence** of the disease in the population. As prevalence increases, Positive Predictive Value (PPV) increases, and Negative Predictive Value (NPV) decreases. 2. **Sensitivity vs. Specificity:** Remember the mnemonics **SNoP** (Sensitivity rules OUT disease when Negative) and **SPiN** (Specificity rules IN disease when Positive). 3. **Likelihood Ratio:** This is considered the best way to measure diagnostic accuracy as it is independent of prevalence, but among the given options, Predictive Value is the standard clinical determinant.

Q: Which graphical representation best depicts the correlation between height and weight?

Scatter diagram. ### Explanation **1. Why Scatter Diagram is Correct:** A **Scatter diagram** (or scatter plot) is the primary graphical tool used to represent the relationship or **correlation** between two continuous quantitative variables (e.g., height and weight). Each point on the graph represents an individual’s pair of measurements. * It helps visualize the **nature** (linear or non-linear) and **direction** (positive or negative) of the relationship. * In this case, as height increases, weight generally increases, showing a **positive correlation**. **2. Why Other Options are Incorrect:** * **Histogram:** This is used to represent the **frequency distribution** of a single continuous variable (e.g., the distribution of weights in a population). It does not show the relationship between two different variables. * **Ogive (Cumulative Frequency Curve):** This graph represents cumulative frequencies. It is used to determine the **median, quartiles, and percentiles** of a dataset, not correlations. * **Line Chart:** This is primarily used to show **trends over time** (time-series data), such as the incidence of a disease over several months or years. **3. Clinical Pearls & High-Yield Facts for NEET-PG:** * **Correlation Coefficient ($r$):** The scatter diagram is the visual precursor to calculating '$r$'. The value of $r$ ranges from **-1 to +1**. * **Perfect Correlation:** If all points on a scatter diagram fall exactly on a straight line, it indicates a perfect correlation ($r = 1$ or $-1$). * **No Correlation:** If the points are scattered randomly in a circle or cloud, $r = 0$. * **Quantitative vs. Qualitative:** Remember that scatter diagrams are for **quantitative** data. For comparing two **qualitative** variables, a **contingency table** or **grouped bar chart** is used.

Q: A district has the following population data and average population served per doctor. Calculate the harmonic mean: AREA | POPULATION SERVED PER DOCTOR | POPULATION | NUMBER OF DOCTORS ---------------------------------------------------------------------- RURAL | 1000 | 50000 | 50 URBAN | 500 | 50000 | 100 TOTAL | - | 100000 | 150

667. ### Explanation **1. Why Option C (667) is Correct:** In biostatistics, the **Harmonic Mean (HM)** is the preferred measure of central tendency for **rates and ratios** (e.g., speed, population per doctor, or cases per unit time). It is defined as the reciprocal of the arithmetic mean of the reciprocals of the values. However, when dealing with groups of different sizes (as in this population data), we calculate the **Weighted Harmonic Mean**. The formula is: $$HM = \frac{\text{Total Population}}{\sum (\frac{\text{Population}}{\text{Value}})}$$ In this case, the "Value" is the population served per doctor. * **Total Population** = 100,000 * **Total Doctors** = (50,000 / 1000) + (50,000 / 500) = 50 + 100 = 150 * **Calculation:** $100,000 / 150 = 666.67$ Rounding off gives **667**. This represents the true average workload per doctor across the entire district. **2. Why Other Options are Incorrect:** * **Option A (500):** This is simply the lower value (Urban) and ignores the Rural data. * **Option B (567):** This value does not correspond to standard statistical measures for this dataset. * **Option D (750):** This is the **Arithmetic Mean** of the two rates $(1000 + 500) / 2$. The Arithmetic Mean overestimates the average when dealing with rates and should be avoided here. **3. High-Yield Clinical Pearls for NEET-PG:** * **Arithmetic Mean:** Best for normally distributed data (e.g., Height, BP). * **Geometric Mean:** Best for data following a logarithmic scale or growth rates (e.g., bacterial counts, serial dilutions, titers). * **Harmonic Mean:** Best for **rates, ratios, and speeds**. It is always the smallest of the three means ($AM > GM > HM$). * **Median:** Best for skewed data or data with extreme outliers (e.g., survival time, incubation period).

Q: Which of the following statements regarding the mode is false?

The mode cannot be calculated for any type of data.. ### Explanation **Why Option C is the Correct (False) Statement:** In biostatistics, the **Mode** is defined as the value that occurs most frequently in a dataset. The statement "The mode cannot be calculated for any type of data" is false because the mode is, in fact, the **only** measure of central tendency that can be used for all levels of data: nominal, ordinal, interval, and ratio. It is particularly uniquely suited for qualitative (nominal) data where calculating a mean or median is mathematically impossible. **Analysis of Incorrect Options:** * **Option A (True):** If every value in a dataset occurs only once (e.g., 1, 2, 3, 4, 5), there is no repeating value, and thus, the dataset has **no mode**. * **Option B (True):** A dataset can be **bimodal** (two modes) or **multimodal** (more than two modes) if multiple values share the highest frequency. * **Option D (True):** The mode is the most appropriate measure for **nominal data** (e.g., determining the most common blood group in a population or the most frequent side effect of a drug). **High-Yield Clinical Pearls for NEET-PG:** * **Relationship in Normal Distribution:** Mean = Median = Mode. * **Skewed Distributions:** * **Positively Skewed:** Mean > Median > Mode (Mode is at the peak). * **Negatively Skewed:** Mode > Median > Mean. * **Stability:** The mode is the least stable measure of central tendency as it can change significantly with small changes in the dataset. * **Empirical Formula:** $Mode = (3 \times \text{Median}) - (2 \times \text{Mean})$.

Q: Which of the following is not a measure of dispersion?

Mode. **Explanation:** In biostatistics, data is summarized using two primary types of measures: **Measures of Central Tendency** (averages) and **Measures of Dispersion** (variability). **Why Mode is the correct answer:** **Mode** is a measure of **Central Tendency**, not dispersion. It is defined as the value that occurs most frequently in a data set. While it identifies the "center" or most popular observation, it provides no information about how spread out or scattered the data points are from one another. **Why the other options are incorrect (Measures of Dispersion):** * **Range (D):** The simplest measure of dispersion, calculated as the difference between the maximum and minimum values. * **Mean Deviation (A):** The arithmetic average of the absolute deviations of observations from the mean. * **Standard Deviation (B):** The most commonly used measure of dispersion in medical research. It quantifies the amount of variation or dispersion of a set of values around the mean. **High-Yield Clinical Pearls for NEET-PG:** * **Measures of Central Tendency:** Mean (arithmetic average), Median (middle value), and Mode (most frequent). * **Measures of Dispersion:** Range, Interquartile Range (IQR), Mean Deviation, Standard Deviation, and Coefficient of Variation. * **Normal Distribution:** In a perfectly symmetrical bell-shaped curve, Mean = Median = Mode. * **Standard Deviation (SD):** In a normal distribution, 1 SD covers 68% of data, 2 SD covers 95%, and 3 SD covers 99.7%. * **Relative Dispersion:** The **Coefficient of Variation** is used to compare dispersion between two series with different units.

Q: When a drug is evaluated for its usefulness in controlled conditions, what does this signify?

Efficacy. ### Explanation In Biostatistics and Epidemiology, the evaluation of a drug or intervention is categorized based on the environment in which the study is conducted. **1. Why Efficacy is Correct:** **Efficacy** refers to the performance of an intervention under **ideal and controlled conditions** (e.g., a Randomized Controlled Trial). It answers the question: *"Can the drug work?"* In these settings, factors like patient compliance, co-morbidities, and environmental variables are strictly monitored to isolate the drug's direct biological effect. **2. Analysis of Incorrect Options:** * **Effectiveness:** This refers to how well a drug performs in **real-world clinical settings** (routine practice). It accounts for factors like poor patient compliance, provider error, and diverse patient populations. It answers: *"Does the drug work in practice?"* * **Efficiency:** This measures the results achieved in relation to the **resources consumed** (money, time, manpower). It is essentially a cost-benefit or cost-effectiveness analysis. It answers: *"Is it worth the cost?"* * **Effect Modification:** This is a biological phenomenon where the magnitude of an effect of an exposure on an outcome varies according to the level of a third variable (the modifier). It is not a measure of drug usefulness. **3. NEET-PG High-Yield Pearls:** * **Phase II & III Clinical Trials** primarily measure **Efficacy**. * **Phase IV (Post-marketing surveillance)** primarily measures **Effectiveness**. * **Mnemonic (The 3 E’s):** * **Efficacy:** **I**deal conditions (**I** in Efficacy). * **Effectiveness:** **R**eal world (**R** in Effectiveness). * **Efficiency:** **M**oney/Resources (**M** in Efficiency). * **Intention-to-treat (ITT) analysis** is used to preserve the benefits of randomization and is often used to estimate effectiveness.

Q: Sampling error is classified as:

Alpha error and Beta error. ### Explanation In biostatistics, **Sampling Error** occurs because a sample is only a subset of the population. When we use sample data to make inferences about a population, we risk making two specific types of errors during hypothesis testing: 1. **Alpha (α) Error (Type I Error):** This occurs when we reject a null hypothesis that is actually true (a "False Positive"). In clinical terms, it means concluding a drug works when it actually doesn't. 2. **Beta (β) Error (Type II Error):** This occurs when we fail to reject a null hypothesis that is actually false (a "False Negative"). Clinically, this means missing a real effect or benefit of a treatment. Since both errors arise from the inherent variability and limitations of sampling, they are both classified as components of sampling error. **Analysis of Options:** * **Option A & B:** These are incomplete. While both are sampling errors, they must be considered together as the two primary risks in statistical inference. * **Option C:** **Gamma error** is not a standard term in basic biostatistics related to hypothesis testing; it is a distractor. * **Option D (Correct):** This correctly identifies that sampling error encompasses both Type I and Type II errors. **High-Yield Clinical Pearls for NEET-PG:** * **P-value:** Represents the probability of committing a **Type I (Alpha) error**. Usually set at <0.05. * **Power of a Study (1 - β):** The probability of correctly identifying a true effect (avoiding a Type II error). * **Sample Size:** Increasing the sample size is the most effective way to reduce **both** Alpha and Beta errors (and thus reduce overall sampling error). * **Non-sampling errors:** These include bias (selection, information) and cannot be reduced by increasing sample size.

Q: When the mean, median, and mode of a distribution are all zero, what type of distribution is it?

Standard normal distribution. ### Explanation **1. Why the Correct Answer is Right:** In a **Normal Distribution** (Gaussian distribution), the curve is perfectly symmetrical and bell-shaped. In such a distribution, the **Mean, Median, and Mode are all equal** (Mean = Median = Mode). A **Standard Normal Distribution** is a specific type of normal distribution where the data is standardized using Z-scores. It is defined by two specific parameters: * **Mean ($\mu$) = 0** * **Standard Deviation ($\sigma$) = 1** Since it is a symmetrical distribution, the Median and Mode also coincide with the Mean at the center. Therefore, when all three measures of central tendency are zero, it must be a Standard Normal Distribution. **2. Why the Incorrect Options are Wrong:** * **B & C (Skewed Distributions):** In skewed distributions, the mean, median, and mode are pulled apart. In **Positively skewed** data, Mean > Median > Mode. In **Negatively skewed** data, Mode > Median > Mean. They cannot all be equal to zero. * **D (J-shaped Distribution):** This is an asymmetrical distribution where the frequency is at its maximum at one end of the scale. It does not follow the central symmetry required for the mean, median, and mode to coincide at zero. **3. Clinical Pearls & High-Yield Facts for NEET-PG:** * **Z-score:** Indicates how many standard deviations a value is from the mean. In a standard normal distribution, the Z-score of the mean is 0. * **Area under the curve:** In a normal distribution, **68%** of values fall within ±1 SD, **95%** within ±2 SD (specifically 1.96 SD), and **99.7%** within ±3 SD. * **Symmetry Rule:** If you know a distribution is perfectly symmetrical and unimodal, the Mean, Median, and Mode will always be identical.

Question 1

Specificity of a diagnostic test refers to its ability to correctly identify individuals without the disease. Which of the following options does NOT represent a component or characteristic of specificity?

Accepted Answer

A True Positive result

Answer

Ability to identify those without disease (True Negatives)

Answer

A True Negative result

Answer

An ideal screening test should have 100% specificity

Question 2

The 'Design Effect' is associated with which of the following sampling techniques?

Accepted Answer

Cluster sampling

Answer

Stratified sampling

Answer

Systemic sampling

Answer

Simple Random Sampling

Question 3

What determines the diagnostic accuracy of a test?

Accepted Answer

Predictive value

Answer

Sensitivity

Answer

Specificity

Answer

Odds ratio

Question 4

Which graphical representation best depicts the correlation between height and weight?

Accepted Answer

Scatter diagram

Answer

Histogram

Answer

Ogive

Answer

Line chart

Question 5

A district has the following population data and average population served per doctor. Calculate the harmonic mean:

AREA | POPULATION SERVED PER DOCTOR | POPULATION | NUMBER OF DOCTORS
----------------------------------------------------------------------
RURAL | 1000 | 50000 | 50
URBAN | 500 | 50000 | 100
TOTAL | - | 100000 | 150

Accepted Answer

667

Answer

500

Answer

567

Answer

750

Question 6

Which of the following statements regarding the mode is false?

Accepted Answer

The mode cannot be calculated for any type of data.

Answer

Some datasets may have no mode.

Answer

A dataset can have more than one mode.

Answer

The mode can be calculated for nominal data.

Question 7

Which of the following is not a measure of dispersion?

Accepted Answer

Mode

Answer

Mean deviation

Answer

Standard deviation

Answer

Range

Question 8

When a drug is evaluated for its usefulness in controlled conditions, what does this signify?

Accepted Answer

Efficacy

Answer

Effectiveness

Answer

Efficiency

Answer

Effect modification

Question 9

Sampling error is classified as:

Accepted Answer

Alpha error and Beta error

Answer

Alpha error

Answer

Beta error

Answer

Gamma error

Question 10

When the mean, median, and mode of a distribution are all zero, what type of distribution is it?

Accepted Answer

Standard normal distribution

Answer

Negatively skewed distribution

Answer

Positively skewed distribution

Answer

J-shaped distribution

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?