Biostatistics Practice Questions

Q: To assess if the sample mean is an accurate estimate of the population mean, which statistical measure should be used?

Standard error. ### Explanation The correct answer is **Standard Error (SE)**. **Why Standard Error is correct:** In biostatistics, we rarely study an entire population; instead, we take a sample. The **Standard Error of the Mean (SEM)** measures the precision of the sample mean as an estimate of the true population mean. It quantifies the "sampling error"—the extent to which the sample mean is likely to deviate from the population mean. A smaller SE indicates that the sample mean is a more accurate reflection of the population mean. **Why the other options are incorrect:** * **Geometric Mean:** This is a measure of central tendency used for skewed data or data following a logarithmic distribution (e.g., titers, incubation periods). It does not measure estimation accuracy. * **Range:** This is a measure of dispersion representing the difference between the highest and lowest values in a dataset. It is highly sensitive to outliers and does not relate the sample to the population. * **Standard Deviation (SD):** While often confused with SE, the SD measures the **variability within a single sample** (how much individual observations spread around the sample mean). It describes the data, whereas SE describes the uncertainty of the estimate. **High-Yield Clinical Pearls for NEET-PG:** * **Formula:** $SE = \frac{SD}{\sqrt{n}}$ (where $n$ is the sample size). * As the **sample size ($n$) increases**, the Standard Error decreases, making the estimate more accurate. * **Confidence Intervals (CI):** SE is used to calculate CI. For a 95% CI, the range is $\text{Mean} \pm 1.96 \times SE$. * **Key Distinction:** Use **SD** to describe the distribution of a variable; use **SE** to report the precision of your results.

Q: A researcher wants to evaluate the effect of a new diet regimen on the weight of a group of 10 patients. The researcher records their weight before and after the regimen. Which statistical test will be applicable in this case?

Paired t-test. ### Explanation **Why Paired t-test is Correct:** The **Paired t-test** (also known as the dependent t-test) is used to compare the means of two related groups. In this scenario, the researcher is measuring the **same individuals** (10 patients) at two different time points (**before and after** an intervention). Since the data points are "paired" (each "before" weight corresponds to an "after" weight for the same person), and weight is a **quantitative (numerical) continuous variable**, the paired t-test is the most appropriate statistical tool to determine if the mean difference is statistically significant. **Why Other Options are Incorrect:** * **Chi-square test:** This is used for **qualitative (categorical)** data (e.g., comparing the proportion of smokers vs. non-smokers). It cannot be used for continuous data like weight. * **Unpaired t-test (Student’s t-test):** This is used to compare the means of **two independent groups** (e.g., comparing the weights of 10 men vs. 10 women). In the question, the groups are dependent (same patients). * **ANOVA (Analysis of Variance):** This is used when comparing the means of **three or more independent groups**. Since there are only two sets of observations here, ANOVA is not required. **High-Yield Clinical Pearls for NEET-PG:** * **Parametric Tests:** t-tests and ANOVA assume a normal distribution of data. * **Non-parametric alternative:** If the data in this scenario were not normally distributed, the **Wilcoxon Signed-Rank Test** would be the non-parametric equivalent of the paired t-test. * **Rule of Thumb:** * 1 Group (Before/After) $\rightarrow$ Paired t-test. * 2 Independent Groups $\rightarrow$ Unpaired t-test. * $>2$ Independent Groups $\rightarrow$ ANOVA.

Q: Positive predictive value is most affected by?

Prevalence. **Explanation:** **Positive Predictive Value (PPV)** is the probability that a person who tests positive actually has the disease. Unlike sensitivity and specificity, which are inherent properties of a diagnostic test, predictive values are heavily dependent on the **Prevalence** of the disease in the population being tested. **Why Prevalence is the correct answer:** Mathematically, PPV is calculated as: $TP / (TP + FP)$. As the prevalence of a disease increases, the number of True Positives (TP) increases and the number of False Positives (FP) decreases. Therefore, **PPV is directly proportional to prevalence.** In a high-prevalence setting (e.g., a tertiary care center), a positive test is more likely to be a true positive than in a low-prevalence setting (e.g., general population screening). **Why other options are incorrect:** * **Sensitivity & Specificity:** While these parameters influence the calculation of PPV, they are fixed characteristics of the test itself. They do not fluctuate based on the population. If prevalence changes, PPV changes even if sensitivity and specificity remain constant. * **Relative Risk:** This is a measure of association used in cohort studies to compare the incidence of disease between exposed and non-exposed groups; it does not determine the accuracy or predictive power of a diagnostic test. **High-Yield Clinical Pearls for NEET-PG:** * **PPV vs. Prevalence:** Direct relationship (Prevalence ↑, PPV ↑). * **NPV vs. Prevalence:** Inverse relationship (Prevalence ↑, NPV ↓). * **Screening Strategy:** To maximize PPV, screening should be targeted at "high-risk" groups where prevalence is higher. * **Bayes' Theorem:** This is the mathematical principle that explains how pre-test probability (prevalence) determines post-test probability (predictive value).

Q: Sample size determination depends upon all EXCEPT:

Test statistic value. To determine the appropriate sample size for a study, a researcher must estimate certain parameters **before** the study begins. The **Test Statistic Value** (e.g., the calculated Z-score, t-score, or Chi-square value) is the result of the data analysis performed **after** the study is completed. Therefore, it cannot be used to determine the sample size. ### Why the other options are incorrect: * **Type I Error (Alpha):** This is the probability of rejecting a true null hypothesis (False Positive). A smaller alpha requires a larger sample size to ensure the findings are not due to chance. * **Power (1 - Beta):** Power is the probability of correctly rejecting a false null hypothesis (detecting a real effect). Higher power (e.g., 80% or 90%) requires a larger sample size. * **Expected Parameter Value:** This refers to the estimated prevalence, mean, or effect size based on pilot studies or previous literature. The smaller the expected difference (effect size) between groups, the larger the sample size needed to detect it. ### High-Yield Facts for NEET-PG: * **Precision (d):** Sample size is inversely proportional to the square of precision ($n \propto 1/d^2$). Finer precision requires a larger sample. * **Standard Deviation ($\sigma$):** Sample size is directly proportional to the variance ($n \propto \sigma^2$). More "noisy" or variable data requires more subjects. * **Formula for Qualitative Data:** $n = 4pq/L^2$ (where $p$ = prevalence, $q = 1-p$, and $L$ = allowable error). * **Memory Aid:** To calculate sample size, you need **A-B-C-D**: **A**lpha, **B**eta (Power), **C**linical effect size, and **D**eviation (Standard Deviation).

Q: Given a normal body temperature of 98.6 F with a standard deviation of 1 F, what is the lower limit of body temperature for 95% of persons?

96 F. ### Explanation This question tests your understanding of the **Normal Distribution (Gaussian Curve)**, a fundamental concept in biostatistics used to define "normal" biological ranges. **1. Why Option B is Correct:** In a normal distribution, the data is distributed around the mean ($\mu$) based on standard deviations ($\sigma$). The key property to remember is the **Empirical Rule**: * Mean ± 1 SD covers **68.3%** of the population. * Mean ± 2 SD covers **95.4%** (commonly simplified to 95%) of the population. * Mean ± 3 SD covers **99.7%** of the population. To find the range for 95% of the population, we use the formula: **Mean ± 2 SD**. * Upper Limit: $98.6 + (2 \times 1) = 100.6^\circ\text{F}$ * Lower Limit: $98.6 - (2 \times 1) = \mathbf{96.6^\circ\text{F}}$ Rounding to the nearest whole number provided in the options, **96°F** is the correct lower limit. **2. Why Other Options are Incorrect:** * **Option A (97°F):** This represents approximately Mean - 1 SD ($98.6 - 1 = 97.6$). This would only account for the lower limit of the 68% range. * **Option C (95°F) & Option D (94°F):** These values fall beyond the 2 SD range. 95.6°F would be the lower limit for 99.7% of the population (Mean - 3 SD). **3. High-Yield Clinical Pearls for NEET-PG:** * **Standard Normal Curve:** Has a mean of 0 and a variance/SD of 1. * **Confidence Interval (CI):** For a 95% CI, the precise multiplier is **1.96**, though "2" is frequently used in MCQ calculations for simplicity. * **Z-score:** Indicates how many standard deviations a value is from the mean. A Z-score of ±1.96 corresponds to the 95% confidence limits. * **Symmetry:** In a normal distribution, Mean = Median = Mode. If the curve is skewed, this equality is lost.

Q: In a set of data with highly variable values, what is the best measure of central tendency?

Median. **Explanation:** In biostatistics, the choice of central tendency depends on the distribution of the data. When a dataset contains **highly variable values** or **extreme outliers** (skewed distribution), the **Median** is the most robust measure. **1. Why Median is correct:** The Median is the middle-most value of a dataset when arranged in ascending or descending order. Unlike the Mean, the Median is **not affected by extreme values (outliers)**. In medical research, data like incubation periods, hospital stay duration, or income levels are often skewed; the Median provides a more "typical" representation of such data because it depends on the position of observations rather than their numerical magnitude. **2. Why other options are incorrect:** * **Mean (Arithmetic Average):** While it is the most commonly used measure, it is highly sensitive to outliers. A single extremely high value will pull the Mean toward it, making it unrepresentative of the "center" in skewed data. * **Mode:** This is the most frequently occurring value. It is often unstable and may not exist or may be far from the center in highly variable datasets. * **Standard Deviation:** This is a measure of **dispersion (spread)**, not central tendency. It describes how much the values deviate from the Mean. **High-Yield Clinical Pearls for NEET-PG:** * **Normal Distribution:** Mean = Median = Mode. * **Positively Skewed Data:** Mean > Median > Mode (Mean is pulled toward the tail). * **Negatively Skewed Data:** Mean < Median < Mode. * **Best measure for Nominal data:** Mode. * **Best measure for Ordinal/Skewed data:** Median.

Q: A study is designed to evaluate the relationship between smoking exposures during the postpartum period and the child's birth weight after delivery. What type of study is this?

Prospective cohort. ### Explanation **Why Prospective Cohort is Correct:** In this study design, the researcher starts with a group of individuals (postpartum mothers) who are currently exposed to a factor (smoking) and follows them over time to observe the development of an outcome (child’s weight/growth). * **Directionality:** It moves from **Cause (Exposure) to Effect (Outcome)**. * **Timing:** Since the exposure is measured *before* the outcome is fully assessed over the postpartum period, it is prospective. * **Key Concept:** Cohort studies are the gold standard for determining **Incidence** and **Relative Risk**. **Why Other Options are Incorrect:** * **A. Case-Control:** This would start with the outcome (e.g., children who already have low birth weight) and look *backwards* in time to see if their mothers smoked. It moves from Effect to Cause. * **C. Cross-sectional:** This provides a "snapshot" where exposure and outcome are measured at the same single point in time. it cannot establish a temporal relationship (which came first). * **D. Clinical Trial (RCT):** This involves an intervention. It would be unethical to intentionally assign mothers to a "smoking group" to see the effect on their children. **High-Yield Clinical Pearls for NEET-PG:** * **Cohort Study:** Best for rare exposures; can study multiple effects of a single exposure. * **Case-Control Study:** Best for rare diseases; uses **Odds Ratio** as the measure of association. * **Temporal Association:** The strongest criteria of Bradford Hill's criteria for causality, best demonstrated by prospective cohort studies. * **Recall Bias:** Common in Case-Control studies; **Selection Bias/Attrition** is more common in Cohort studies.

Question 1

To assess if the sample mean is an accurate estimate of the population mean, which statistical measure should be used?

Accepted Answer

Standard error

Answer

Geometric mean

Answer

Range

Answer

Standard deviation

Question 2

A researcher wants to evaluate the effect of a new diet regimen on the weight of a group of 10 patients. The researcher records their weight before and after the regimen. Which statistical test will be applicable in this case?

Accepted Answer

Paired t-test

Answer

Chi square test

Answer

Unpaired t-test

Answer

ANOVA

Question 3

Positive predictive value is most affected by?

Accepted Answer

Prevalence

Answer

Sensitivity

Answer

Specificity

Answer

Relative risk

Question 4

Sample size determination depends upon all EXCEPT:

Accepted Answer

Test statistic value

Answer

Type I error

Answer

Power

Answer

Expected parameter value

Question 5

What does reliability mean in research?

Accepted Answer

The consistency of results when a study or measurement is repeated.

Answer

The degree of variation observed in experimental outcomes.

Answer

The degree to which a measurement is accurate.

Answer

The ease with which a test or procedure can be performed.

Question 6

Given a normal body temperature of 98.6

F with a standard deviation of 1

F, what is the lower limit of body temperature for 95% of persons?

Accepted Answer

96

F

Answer

97

F

Answer

95

F

Answer

94

F

Question 7

In a town with a population of 20,000, there were 456 live births in a year. Of these, 56 resulted in stillbirths. The total number of deaths was 247, with 56 deaths occurring within the first 28 days of life and 34 deaths occurring after 28 days but before completing the first year of life. Calculate the Infant Mortality Rate for this area.

Accepted Answer

225

Answer

197

Answer

392

Answer

344

Question 8

In a set of data with highly variable values, what is the best measure of central tendency?

Accepted Answer

Median

Answer

Mean

Answer

Mode

Answer

Standard Deviation

Question 9

A study is designed to evaluate the relationship between smoking exposures during the postpartum period and the child's birth weight after delivery. What type of study is this?

Accepted Answer

Prospective cohort

Answer

Case control

Answer

Cross sectional

Answer

Clinical trial

Question 10

A cardiologist found a highly significant correlation coefficient (r = 0.90) between systolic blood pressure and serum cholesterol values of the patients attending his clinic. Which of the following statements is a wrong interpretation of the correlation coefficient observed?

Accepted Answer

Since there is a high correlation, the magnitude of both the measurements are likely to be close to each other.

Answer

A patient with a high level of systolic blood pressure is also likely to have a high level of serum cholesterol.

Answer

A patient with a low level of systolic blood pressure is also likely to have a low level of serum cholesterol.

Answer

About 80% of the variation in systolic blood pressure among his patients can be explained by their serum cholesterol values.

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?