Biostatistics Practice Questions

Q: The value of one parameter can be estimated from another related parameter by:

Regression. **Explanation:** The core of this question lies in distinguishing between the **relationship** and the **prediction** of variables. **Why Regression is Correct:** Regression is a statistical method used to estimate or predict the value of a dependent variable ($Y$) based on the known value of an independent variable ($X$). It uses a mathematical equation (e.g., $Y = a + bX$) to define the functional relationship. In medical research, if we know the regression equation between age and blood pressure, we can **estimate** a person’s blood pressure if their age is known. **Why Other Options are Incorrect:** * **Correlation (Option B):** While correlation measures the strength and direction of a linear relationship between two variables (using the correlation coefficient '$r$'), it **cannot** be used to predict or estimate the value of one variable from another. It only tells you how closely they move together. * **Scatter Diagram (Option C):** This is a visual/graphical representation of the relationship between two continuous variables. It helps identify the pattern (linear, curvilinear, or no relationship) but does not provide a mathematical estimate. * **Bar Chart (Option D):** This is a tool for representing discrete/nominal data (e.g., number of cases per year). It is not used for showing relationships between two continuous variables. **High-Yield Clinical Pearls for NEET-PG:** * **Correlation Coefficient ($r$):** Ranges from $-1$ to $+1$. $0$ means no linear correlation. * **Coefficient of Determination ($r^2$):** Tells you the proportion of variance in the dependent variable that is predictable from the independent variable. * **Regression Line:** Also known as the "Line of Best Fit." * **Key Distinction:** Correlation = Association; Regression = Prediction/Estimation.

Q: If the prevalence of Candida glabrata infection is found to be 80% in a population of 100, what will be the range of prevalence at a 95% confidence level?

72% to 88%. ### Explanation **1. Understanding the Correct Answer (C):** The question asks for the **95% Confidence Interval (CI)** for a proportion. In biostatistics, the formula for the 95% Confidence Interval is: **$CI = p \pm 1.96 \times SE$** *(Where $p$ = prevalence/proportion and $SE$ = Standard Error)* * **Step 1:** Identify the variables. $p = 80\%$ (0.8), $q = 20\%$ (0.2), and $n = 100$. * **Step 2:** Calculate Standard Error (SE) for proportion: $SE = \sqrt{\frac{p \times q}{n}} = \sqrt{\frac{80 \times 20}{100}} = \sqrt{\frac{1600}{100}} = \sqrt{16} = 4$. * **Step 3:** Apply the 95% CI formula (using 2 as a rounded value for 1.96 for quick calculation): $80 \pm (2 \times 4) = 80 \pm 8$. * **Lower Limit:** $80 - 8 = 72\%$ * **Upper Limit:** $80 + 8 = 88\%$ Thus, we are 95% confident that the true population prevalence lies between **72% and 88%**. **2. Why Other Options are Incorrect:** * **Option A & D:** These ranges are too wide. They suggest a much larger Standard Error, which would only occur with a significantly smaller sample size (e.g., $n < 10$). * **Option B:** This range (65% to 95%) implies a Standard Error of 7.5, which does not mathematically align with the given sample size of 100. **3. High-Yield Clinical Pearls for NEET-PG:** * **Standard Error (SE):** Measures the precision of the sample estimate. As sample size ($n$) increases, SE decreases, and the Confidence Interval becomes narrower (more precise). * **Confidence Levels:** * 95% CI = $Mean \pm 1.96 \times SE$ (Commonly used in research) * 99% CI = $Mean \pm 2.58 \times SE$ * 68% CI = $Mean \pm 1 \times SE$ * **Prevalence vs. Incidence:** Prevalence (Total cases/Total population) is a cross-sectional measure, whereas Incidence (New cases/Population at risk) is a longitudinal measure.

Q: In random sampling, what is the chance of an item being selected?

Same and known. ### Explanation In biostatistics, **Simple Random Sampling (SRS)** is the gold standard of probability sampling. The fundamental principle of random sampling is that every individual unit in the population has an **equal (same)** and **non-zero (known)** probability of being selected for the study. 1. **Why "Same and Known" is correct:** * **Same (Equal):** To eliminate selection bias, every member of the sampling frame must have the exact same probability of inclusion ($1/N$, where $N$ is the population size). * **Known:** For a method to be "probabilistic," the chance of selection must be pre-determined and calculable. If the probability is unknown, the sampling becomes non-random (convenience or purposive), which invalidates many statistical tests. 2. **Analysis of Incorrect Options:** * **B & C (Not known):** If the chance is "not known," it is a **Non-Probability Sampling** (e.g., Quota or Snowball sampling). Here, the researcher cannot calculate the sampling error. * **D (Not same but known):** This describes certain complex designs like *Stratified Random Sampling* where different strata might have different weights, but in the context of basic "Random Sampling" (SRS), the "Same and Known" rule is the defining characteristic. ### NEET-PG High-Yield Pearls * **Gold Standard:** Simple Random Sampling is the best method to representative a population, provided a complete **Sampling Frame** (list of all individuals) is available. * **Methods of Randomization:** Use of a **Random Number Table** (e.g., Tippett’s Table), computer-generated numbers, or a lottery method. * **Bias Control:** Randomization is the only way to control for **unknown confounders** in clinical trials. * **Systematic Sampling:** Often called "Quasi-random," it involves selecting every $k^{th}$ item (Sampling Interval = $N/n$).

Q: When analyzing data, allocation into similar groups is done to ensure what?

Comparability. ### Explanation **1. Why "Comparability" is Correct:** In epidemiological studies and clinical trials, the primary goal of **Randomization** or **Matching** is to ensure that the study group and the control group are as similar as possible regarding all variables except the intervention being studied. When groups are allocated into similar categories (homogeneity), it ensures **Comparability**. This minimizes **selection bias** and controls for **confounding factors**, allowing the researcher to attribute any observed difference in outcome solely to the intervention rather than baseline differences between groups. **2. Why Other Options are Incorrect:** * **Accuracy:** Refers to how close a measurement is to the true value. It is a function of systematic error (bias); while allocation affects bias, the specific act of making groups "similar" is defined as comparability. * **Validity:** Refers to whether a test measures what it intends to measure. Internal validity depends on comparability, but validity is a broader concept encompassing the entire study design and execution. * **Sensitivity:** This is a measure of a diagnostic test's ability to correctly identify those with the disease (True Positive Rate). It is a property of a test, not a result of group allocation. **3. Clinical Pearls & High-Yield Facts for NEET-PG:** * **Randomization** is the "Heart of a Clinical Trial" because it is the best method to ensure comparability by distributing both known and unknown confounders equally. * **Matching** is a technique used primarily in **Case-Control studies** to ensure comparability between cases and controls. * **Confounding** occurs when the relationship between an exposure and outcome is distorted by a third variable. Comparability is the primary defense against confounding. * **Blinding** is done to eliminate observer/participant bias, whereas **Allocation** is done to ensure comparability.

Q: What percentage of the area under a normal distribution curve falls within 2 standard deviations (SD) of the mean?

95%. ### Explanation This question tests the fundamental concept of the **Normal (Gaussian) Distribution**, which is a symmetrical, bell-shaped curve characterized by its mean and standard deviation (SD). In biostatistics, the area under this curve represents the probability or percentage of observations. **1. Why the Correct Answer is Right:** The Normal Distribution follows the **Empirical Rule** (also known as the 68-95-99.7 rule). According to this rule: * **Mean ± 1 SD** covers approximately **68.3%** of the values. * **Mean ± 2 SD** covers approximately **95.4%** (commonly rounded to **95%** for exams). * **Mean ± 3 SD** covers approximately **99.7%** of the values. Therefore, 95% of the data points in a normally distributed population fall within 2 standard deviations of the mean. **2. Why the Incorrect Options are Wrong:** * **Option A (65%):** This is incorrect. The closest standard value is 68% (for 1 SD). * **Option B (75%):** This does not correspond to a standard SD landmark in a normal distribution. However, according to *Chebyshev's Inequality* (which applies to any distribution shape), at least 75% of data falls within 2 SDs. * **Option D (99%):** This is incorrect for 2 SD. Approximately 99.7% of the area is covered by **3 SD**, not 2. **3. High-Yield Clinical Pearls for NEET-PG:** * **Confidence Intervals (CI):** The 95% CI is the most commonly used in medical research. It is calculated as: $Mean \pm (1.96 \times SEM)$. Note that **1.96** is the precise multiplier for 95%, often rounded to 2 in basic MCQ questions. * **Standard Normal Curve:** A specific normal distribution where the **Mean = 0** and **SD = 1**. * **Z-score:** Indicates how many standard deviations a value is from the mean. A Z-score of 2 corresponds to the 95% area. * **Symmetry:** In a perfect normal distribution, the **Mean, Median, and Mode are all equal.**

Q: A study was conducted in a population of 2000 individuals. The mean hemoglobin level is 13.5 grams, and the distribution follows a normal distribution curve. What percentage of people will have a hemoglobin level more than 13.5 grams?

50%. ### Explanation **1. Why the Correct Answer (C) is Right:** The core concept here is the **Normal Distribution (Gaussian Distribution)**. In a perfectly normal distribution, the curve is symmetrical and bell-shaped. A fundamental property of this distribution is that the **Mean, Median, and Mode are all equal** and located exactly at the center of the curve. Since the Median represents the 50th percentile, exactly 50% of the observations lie below the mean and **50% lie above the mean**. In this question, the mean is 13.5 g/dL; therefore, regardless of the standard deviation or total population size (2000), 50% of the individuals will have a hemoglobin level higher than 13.5 g/dL. **2. Why the Incorrect Options are Wrong:** * **Option A (5%):** This value is associated with the "tails" of the distribution. In a normal distribution, approximately 5% of the population falls outside ±1.96 Standard Deviations (2.5% in each tail). * **Option B (25%) & D (75%):** These represent the First (Q1) and Third (Q3) Quartiles, respectively. While these are important markers in skewed distributions or box plots, they do not represent the division at the mean in a normal curve. **3. NEET-PG Clinical Pearls & High-Yield Facts:** * **Symmetry:** In a normal distribution, Skewness is **0** and Kurtosis is **3**. * **The 68-95-99.7 Rule (Empirical Rule):** * Mean ± 1 SD covers **68.2%** of values. * Mean ± 2 SD covers **95.4%** of values. * Mean ± 3 SD covers **99.7%** of values. * **Standard Normal Distribution:** A specific normal distribution where the Mean is **0** and the Standard Deviation is **1**. * **Z-score:** Indicates how many standard deviations a value is from the mean. At the mean (13.5 in this case), the Z-score is 0.

Q: A standard 'z-score' is related to which statistical distribution?

Normal distribution. **Explanation:** The **z-score** (also known as the standard score) is a fundamental concept in the **Normal Distribution** (Gaussian distribution). It represents the number of standard deviations a data point is from the mean. In a Standard Normal Distribution, the mean is 0 and the standard deviation is 1. The formula $z = (x - \mu) / \sigma$ allows researchers to compare data from different sets by "standardizing" them onto a common scale. **Why the other options are incorrect:** * **Binomial distribution:** This deals with discrete data involving only two possible outcomes (e.g., Success/Failure, Dead/Alive), whereas z-scores apply to continuous data. * **Chi-square test:** This is a non-parametric test used to analyze categorical data and determine the association between two variables; it does not utilize z-scores for its distribution. * **t-test:** While similar to the z-test, the t-test is used for small sample sizes ($n < 30$) where the population variance is unknown. It follows a **Student’s t-distribution**, which has "fatter tails" than the normal distribution. **High-Yield Clinical Pearls for NEET-PG:** * **68-95-99.7 Rule:** In a normal distribution, a z-score of $\pm1$ covers 68% of data, $\pm2$ covers 95%, and $\pm3$ covers 99.7%. * **Z-test vs. T-test:** Use a **Z-test** when the sample size is large ($n > 30$) and the population standard deviation is known. Use a **T-test** when $n < 30$. * **Symmetry:** In a normal distribution (z-distribution), the Mean, Median, and Mode are all equal.

Question 1

The value of one parameter can be estimated from another related parameter by:

Accepted Answer

Regression

Answer

Bar chart

Answer

Correlation

Answer

Scatter diagram

Question 2

Which of the following best represents a population with age variation?

Accepted Answer

Population pyramid

Answer

Life table

Answer

Correlation coefficient

Answer

Barchart

Question 3

If the prevalence of Candida glabrata infection is found to be 80% in a population of 100, what will be the range of prevalence at a 95% confidence level?

Accepted Answer

72% to 88%

Answer

4% to 100%

Answer

65% to 95%

Answer

70% to 100%

Question 4

What does relative risk represent?

Accepted Answer

Incidence among exposed divided by incidence among non-exposed

Answer

Incidence among non-exposed divided by incidence among exposed

Answer

Incidence in exposed minus incidence in non-exposed

Answer

None of the above

Question 5

In random sampling, what is the chance of an item being selected?

Accepted Answer

Same and known

Answer

Not same and not known

Answer

Same and not known

Answer

Not same but known

Question 6

When analyzing data, allocation into similar groups is done to ensure what?

Accepted Answer

Comparability

Answer

Accuracy

Answer

Validity

Answer

Sensitivity

Question 7

What percentage of the area under a normal distribution curve falls within 2 standard deviations (SD) of the mean?

Accepted Answer

95%

Answer

65%

Answer

75%

Answer

99%

Question 8

Which of the following is the best method to compare vital statistics between countries?

Accepted Answer

Age-standardized death rate

Answer

Crude death and birth rates

Answer

Proportional mortality rate

Answer

Age-specific death rate

Question 9

A study was conducted in a population of 2000 individuals. The mean hemoglobin level is 13.5 grams, and the distribution follows a normal distribution curve. What percentage of people will have a hemoglobin level more than 13.5 grams?

Accepted Answer

50%

Answer

5%

Answer

25%

Answer

75%

Question 10

A standard 'z-score' is related to which statistical distribution?

Accepted Answer

Normal distribution

Answer

Binomial distribution

Answer

Chi-square test

Answer

t-test

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?