Biostatistics Practice Questions

Q: What is the order of the margin of error in the graph provided?

3 > 2 > 1. ***3 > 2 > 1*** - In forest plots, the **margin of error** is represented by the **width of confidence intervals**, where item 3 shows the **widest CI** indicating the largest margin of error. - **Wider confidence intervals** typically result from **smaller sample sizes** or **greater variability** in the data, making item 3 the most uncertain estimate. *3 > 1 > 2* - This incorrectly places item 1 as having a **larger margin of error** than item 2, which contradicts the visual evidence in the forest plot. - Item 1 actually has the **narrowest confidence interval**, indicating the **smallest margin of error** and highest precision. *1 > 3 > 2* - This completely reverses the correct order by suggesting item 1 has the **largest margin of error**, which is incorrect based on its narrow CI. - Item 1 demonstrates the **highest precision** with the smallest uncertainty, not the largest. *1 = 2 = 3* - This suggests **equal margins of error** across all three items, which is clearly contradicted by the **different CI widths** visible in the forest plot. - The **varying confidence interval widths** demonstrate **unequal precision** and different margins of error between the studies.

Q: In a study, every fourth student is selected from a class for sampling. This is a type of sampling method:

Systematic random sampling. ### Explanation **Why Systematic Random Sampling is Correct:** Systematic random sampling involves selecting subjects at a fixed, periodic interval—referred to as the **sampling interval ($k$)**. In this scenario, the interval is 4 (every 4th student). The process begins by selecting a starting point at random from the first $k$ subjects, and then every $k^{th}$ unit is chosen thereafter. It is commonly used in clinical settings (e.g., selecting every 5th patient entering an OPD) because it is simpler to implement than simple random sampling while ensuring the sample is spread evenly across the population. **Analysis of Incorrect Options:** * **A. Simple Random Sampling:** Every individual has an equal and independent chance of being selected (e.g., lottery method or computer-generated random numbers). It does not follow a fixed numerical pattern or interval. * **C. Stratified Random Sampling:** The population is divided into homogenous groups (**strata**) based on specific characteristics (e.g., age, gender, or SES), and samples are then drawn from each stratum. This is used when the population is heterogeneous. * **D. Cluster Random Sampling:** The population is divided into groups called **clusters** (usually based on geography, like villages or blocks). Instead of selecting individuals, entire clusters are randomly selected. This is the method used in the WHO EPI coverage surveys (30 x 7 cluster sampling). **High-Yield NEET-PG Pearls:** * **Sampling Interval ($k$):** Calculated as $N/n$ (Total Population / Sample Size). * **Multistage Sampling:** The most common method used in large-scale national surveys (like NFHS). * **Snowball Sampling:** A non-probability sampling method used for "hidden populations" (e.g., IV drug users or commercial sex workers). * **Precision:** Stratified sampling is generally more precise than simple random sampling for the same sample size.

Q: A cardiologist wants to study the effect of an antihypertensive drug. He notes down the initial systolic blood pressure (mmHg) of 50 patients and then administers the drug. After a week's treatment, he measures the systolic blood pressure again. Which of the following is the most appropriate statistical test of significance to test the statistical significance of the change in blood pressure?

Paired t-test. ### Explanation **Why Paired t-test is the Correct Answer:** The study design involves measuring the **same quantitative variable** (Systolic Blood Pressure in mmHg) in the **same group of individuals** at two different points in time (Before and After treatment). * **Quantitative Data:** Blood pressure is measured on a ratio scale (continuous numerical data). * **Dependent Samples:** Since the "Before" and "After" readings are taken from the same 50 patients, the observations are "paired" or "related." The Paired t-test is specifically designed to compare the means of two related groups to determine if the observed change (the difference) is statistically significant. **Analysis of Incorrect Options:** * **B. Unpaired (Independent) t-test:** This is used to compare the means of two **independent** groups (e.g., comparing BP between Group A receiving Drug X and Group B receiving a Placebo). * **C. Analysis of Variance (ANOVA):** This is used when comparing the means of **three or more** independent groups. If the study had three different dosage groups, ANOVA would be appropriate. * **D. Chi-square test:** This is a non-parametric test used for **qualitative (categorical) data** (e.g., comparing the proportion of "cured" vs. "not cured" patients). It is not used for continuous numerical values like mmHg. **High-Yield Clinical Pearls for NEET-PG:** * **Parametric Tests:** Require data to follow a Normal (Gaussian) Distribution. Both Paired and Unpaired t-tests are parametric. * **Before vs. After = Paired t-test:** Whenever you see a study design involving "pre-test/post-test" or "self-control," think Paired t-test. * **Case-Control Matching:** If a study matches cases and controls 1:1, a paired t-test is also used. * **Standard Error of Difference:** The t-test relies on the ratio of the observed difference to the standard error of that difference.

Q: What does 'true negative' represent in diagnostic testing?

Specificity. **Explanation:** In biostatistics, diagnostic test performance is evaluated using a 2x2 contingency table. **Specificity** is defined as the ability of a test to correctly identify those **without the disease**. It represents the proportion of truly healthy individuals who yield a negative test result. Mathematically, it is calculated as: $$\text{Specificity} = \frac{\text{True Negatives (TN)}}{\text{True Negatives (TN)} + \text{False Positives (FP)}}$$ A highly specific test has few "false alarms," making it essential for **confirming** a diagnosis (Rule: **SpPIn** – Specificity rules IN). **Analysis of Incorrect Options:** * **Sensitivity:** Represents the **True Positive** rate. It is the ability of a test to correctly identify those who *have* the disease. (Rule: **SnNOut** – Sensitivity rules OUT). * **Positive Predictive Value (PPV):** The probability that a patient actually has the disease given a positive test result. It is influenced by the prevalence of the disease. * **Negative Predictive Value (NPV):** The probability that a patient is truly healthy given a negative test result. **NEET-PG High-Yield Pearls:** 1. **Screening vs. Diagnosis:** Screening tests require high **Sensitivity** (to catch all cases), while confirmatory tests require high **Specificity** (to avoid false labeling). 2. **Prevalence Impact:** Specificity and Sensitivity are **independent** of disease prevalence. However, PPV increases and NPV decreases as prevalence increases. 3. **Ideal Test:** An ideal diagnostic test has 100% sensitivity and 100% specificity, represented by the top-left corner of an ROC curve.

Q: Which of the following is used for qualitative data?

Mode. **Explanation:** In biostatistics, data is broadly classified into **Quantitative** (numerical) and **Qualitative** (categorical). The choice of descriptive statistics depends entirely on the type of data being analyzed. **Why Mode is the Correct Answer:** The **Mode** is defined as the most frequently occurring value in a dataset. It is the only measure of central tendency that can be used for **Qualitative (Nominal) data**. For example, in a study of blood groups (A, B, AB, O), the most common blood group is the mode. While the Mean and Median require numerical values to calculate, the Mode simply identifies the most frequent category. **Analysis of Incorrect Options:** * **A. Mean:** This is the arithmetic average. It requires numerical values and is used for **Quantitative data** (specifically normally distributed data). It cannot be calculated for categories like "gender" or "color." * **B. Whisker Plot (Box-and-Whisker):** This is a graphical representation of the dispersion of **Quantitative data**. It displays the five-number summary: minimum, first quartile, median, third quartile, and maximum. * **D. Histogram:** This is a bar-like representation used for **Continuous Quantitative data**. Unlike a bar chart (used for qualitative data), the bars in a histogram touch each other to represent a continuous range of values. **NEET-PG High-Yield Pearls:** * **Qualitative Data:** Best represented by **Bar charts, Pie charts, and Pictograms.** * **Quantitative Data:** Best represented by **Histograms, Frequency Polygons, and Scatter diagrams.** * **Central Tendency:** * **Mean:** Most sensitive to outliers (extreme values). * **Median:** Best for skewed quantitative data. * **Mode:** Best for qualitative data and identifying the most "popular" characteristic.

Q: Blood pressure level is an example of which of the following scales?

Metric. **Explanation:** In biostatistics, data is categorized into four levels of measurement: Nominal, Ordinal, Interval, and Ratio. The correct answer is **Metric** because it serves as an umbrella term for quantitative data (Interval and Ratio scales). **Why Metric is Correct:** Blood pressure (BP) is measured in millimeters of mercury (mmHg). It is a **Ratio scale** (a type of Metric scale) because it has a constant interval between units and a "true zero" point (though a BP of zero is not compatible with life, it is mathematically possible). Metric scales allow for precise mathematical operations like calculating the mean, standard deviation, and performing t-tests. **Why other options are incorrect:** * **Nominal:** This scale is for qualitative data used for labeling or naming (e.g., Gender, Blood Group, Yes/No). It has no numerical value or inherent order. * **Ordinal:** This scale involves data that can be ranked or ordered, but the distance between ranks is not uniform (e.g., Stages of Cancer, Socio-economic status, Pain scales like Mild/Moderate/Severe). While BP can be *converted* into ordinal data (e.g., Normal vs. Hypertensive), the raw BP level itself is Metric. **High-Yield Clinical Pearls for NEET-PG:** * **Qualitative Data:** Includes Nominal and Ordinal scales. * **Quantitative Data:** Includes Metric scales (Interval and Ratio). * **Discrete vs. Continuous:** BP is **Continuous Metric data** because it can take any value within a range. In contrast, "number of children" is **Discrete Metric data**. * **Memory Aid (NOIR):** **N**ominal (Name), **O**rdinal (Order), **I**nterval (Integer/Scale), **R**atio (Relationship/True Zero).

Q: Which type of diagram is best for studying the decline in the percentage of syphilis in men and women over the last 10 years?

Line diagram. **Explanation** The correct answer is **Line diagram** because it is the most effective tool for visualizing **trends over time** (time-series data). **1. Why Line Diagram is Correct:** In biostatistics, a line diagram is used to show the relationship between two continuous variables, most commonly where the x-axis represents time (years, months, or days). Since the question asks to study the "decline over the last 10 years" for two different groups (men and women), a line diagram allows for: * Clear visualization of the **trend** (upward, downward, or fluctuating). * Easy **comparison** between two or more series (men vs. women) on the same graph. **2. Why Other Options are Incorrect:** * **Pie Chart:** Used to show the **proportional distribution** of a single variable at a specific point in time (e.g., causes of blindness). It cannot show trends over a decade. * **Histogram:** Used to represent the frequency distribution of **continuous quantitative data** (e.g., age groups, height). It is a snapshot of data, not a tool for chronological trends. * **Frequency Polygon:** A variation of the histogram created by joining the midpoints of the bars. While it shows distribution, it is not used for longitudinal time-trend analysis. **3. NEET-PG High-Yield Pearls:** * **Trend over time:** Always choose **Line Diagram**. * **Correlation between two variables:** Choose **Scatter Diagram**. * **Comparison of discrete/qualitative data:** Choose **Bar Chart**. * **Geographical distribution:** Choose **Spot Map** (Chloropleth map). * **Relationship between Mean, Median, and Mode:** In a normal distribution, they are equal. In a skewed distribution, the **Median** is the best measure of central tendency.

Q: Which statistical test is used to study ordinal data of two independent groups that are not normally distributed?

Wilcoxon signed-rank test. ### Explanation The correct answer is **D. Wilcoxon signed-rank test**. #### Why it is correct: In biostatistics, when data is **ordinal** (ranked) or **not normally distributed** (non-parametric), we cannot use standard parametric tests. * For **two independent groups**, the standard non-parametric test is the **Mann-Whitney U test** (also known as the Wilcoxon Rank-Sum test). * *Note on the Question/Option:* While the **Wilcoxon Signed-Rank test** is typically used for **paired/dependent** data, in many competitive exams (including NEET-PG), the "Wilcoxon" family of tests is often grouped together as the non-parametric alternative to the t-test when more specific options like Mann-Whitney U are absent. #### Why other options are incorrect: * **A. Student’s t-test:** This is a **parametric** test used for comparing means of two groups. It requires the data to be **normally distributed** and on an **interval/ratio** scale. * **B. Z-test:** Used for large samples (n > 30) where the population variance is known. It is also a **parametric** test. * **C. One-way ANOVA:** Used to compare means of **three or more** independent groups. It is the parametric equivalent of the Kruskal-Wallis test. #### High-Yield Clinical Pearls for NEET-PG: * **Parametric vs. Non-Parametric Mapping:** * 2 Independent groups: **Unpaired t-test** $\rightarrow$ **Mann-Whitney U test**. * 2 Paired groups: **Paired t-test** $\rightarrow$ **Wilcoxon Signed-Rank test**. * 3+ Independent groups: **ANOVA** $\rightarrow$ **Kruskal-Wallis test**. * **Data Types:** Always check the scale. If the data is **Qualitative/Nominal** (e.g., Male/Female), use the **Chi-square test**. If it is **Ordinal** (e.g., Pain scale: Mild/Moderate/Severe), always choose a **Non-parametric test**. * **Normal Distribution:** If a question mentions "skewed data" or "not normally distributed," immediately rule out t-tests and ANOVA.

Q: Given the number of malaria cases detected over 10 years as 100, 160, 190, 250, 300, 300, 320, 320, 550, 380, how would you calculate the average number of cases per year?

Arithmetic mean. **Explanation:** The correct answer is **Arithmetic Mean (A)**. In biostatistics, the **Arithmetic Mean** is the most commonly used measure of central tendency for quantitative (numerical) data. It is calculated by summing all observations and dividing by the total number of observations. In this scenario, we are dealing with a discrete numerical variable (number of malaria cases) over a period of time. Since the data points represent a simple series of counts without extreme skewness or logarithmic growth, the arithmetic mean provides the most accurate "average" for routine epidemiological monitoring. **Why other options are incorrect:** * **Geometric Mean (B):** This is used for data that follows a logarithmic distribution or shows exponential growth, such as bacterial counts, serial dilutions, or parasite densities (e.g., calculating the average parasite load in a malaria patient). * **Mode (C):** This represents the most frequently occurring value in a dataset (here, 300 and 320). While useful for identifying the most common observation, it does not account for the entire range of data and is not a true "average." * **Median (D):** This is the middle-most value when data is arranged in ascending order. It is the preferred measure of central tendency for **skewed data** or data containing **outliers**, as it is not affected by extreme values. **Clinical Pearls for NEET-PG:** * **Mean:** Best for normally distributed (symmetrical) data. * **Median:** Best for skewed data (e.g., incubation periods, survival rates). * **Geometric Mean:** Best for titers, rates of change, and microbiological data. * **Relationship in Positive Skew:** Mean > Median > Mode. * **Relationship in Negative Skew:** Mode > Median > Mean.

Q: What is a measurement of dispersion?

Range. ### Explanation In biostatistics, data is summarized using two primary types of measures: **Measures of Central Tendency** (averages) and **Measures of Dispersion** (variability). **Why Range is the Correct Answer:** **Range** is a measure of dispersion. It represents the simplest way to quantify the spread or variability of a dataset by calculating the difference between the maximum and minimum values (Range = Highest value – Lowest value). While it is easy to calculate, it is highly sensitive to outliers. **Analysis of Incorrect Options:** * **A. Mean:** This is a measure of **central tendency**. It is the arithmetic average of all observations and is the most commonly used measure for normally distributed data. * **B. Mode:** This is a measure of **central tendency**. It represents the most frequently occurring value in a dataset. It is the only measure that can be used for qualitative (nominal) data. * **C. Median:** This is a measure of **central tendency**. It is the middle-most value when data is arranged in ascending or descending order. It is the preferred measure for skewed distributions as it is not affected by extreme values. **High-Yield Clinical Pearls for NEET-PG:** * **Measures of Dispersion include:** Range, Mean Deviation, Standard Deviation (most common), and Interquartile Range. * **Standard Deviation (SD):** The most important measure of dispersion in medicine; it summarizes how much individual values deviate from the mean. * **Relative Measures:** While SD and Range are "absolute" measures, the **Coefficient of Variation** is a "relative" measure used to compare the variability of two different series (e.g., comparing height in cm vs. weight in kg). * **Normal Distribution:** In a perfectly normal distribution, Mean = Mode = Median.

Question 1

What is the order of the margin of error in the graph provided?

Accepted Answer

3 > 2 > 1

Answer

3 > 1 > 2

Answer

1 > 3 > 2

Answer

1 = 2 = 3

Question 2

In a study, every fourth student is selected from a class for sampling. This is a type of sampling method:

Accepted Answer

Systematic random sampling

Answer

Simple random sampling

Answer

Stratified random sampling

Answer

Cluster random sampling

Question 3

A cardiologist wants to study the effect of an antihypertensive drug. He notes down the initial systolic blood pressure (mmHg) of 50 patients and then administers the drug. After a week's treatment, he measures the systolic blood pressure again. Which of the following is the most appropriate statistical test of significance to test the statistical significance of the change in blood pressure?

Accepted Answer

Paired t-test

Answer

Unpaired or independent t-test

Answer

Analysis of variance

Answer

Chi-square test

Question 4

What does 'true negative' represent in diagnostic testing?

Accepted Answer

Specificity

Answer

Sensitivity

Answer

Positive predictive value

Answer

Negative predictive value

Question 5

Which of the following is used for qualitative data?

Accepted Answer

Mode

Answer

Mean

Answer

Whisker plot

Answer

Histogram

Question 6

Blood pressure level is an example of which of the following scales?

Accepted Answer

Metric

Answer

Nominal

Answer

Ordinal

Answer

None of the above

Question 7

Which type of diagram is best for studying the decline in the percentage of syphilis in men and women over the last 10 years?

Accepted Answer

Line diagram

Answer

Pie chart

Answer

Histogram

Answer

Frequency polygon curve

Question 8

Which statistical test is used to study ordinal data of two independent groups that are not normally distributed?

Accepted Answer

Wilcoxon signed-rank test

Answer

Student's t-test

Answer

Z-test

Answer

One-way analysis of variance

Question 9

Given the number of malaria cases detected over 10 years as 100, 160, 190, 250, 300, 300, 320, 320, 550, 380, how would you calculate the average number of cases per year?

Accepted Answer

Arithmetic mean

Answer

Geometric mean

Answer

Mode

Answer

Median

Question 10

What is a measurement of dispersion?

Accepted Answer

Range

Answer

Mean

Answer

Mode

Answer

Median

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?