Biostatistics Practice Questions

Q: The birth weight of all 10 babies born in a nursing home on a day was 2.7 kg. What is the standard deviation?

0. ### Explanation **1. Why the Correct Answer (A) is Right:** Standard Deviation (SD) is a measure of **dispersion** or **variability** in a dataset. It quantifies how much the individual values in a distribution deviate from the mean. * In this scenario, every single observation (all 10 babies) has the exact same value: **2.7 kg**. * The Mean ($\bar{x}$) is therefore 2.7 kg. * Since there is no variation between the individual values and the mean (Difference = 0), the sum of squares of deviations is zero. * **Mathematical Principle:** If all observations in a sample are identical, the variance and the standard deviation will always be **zero**. **2. Why the Incorrect Options are Wrong:** * **Option B (1):** This would imply a moderate spread where values typically fall between 1.7 and 3.7 kg. Since there is no spread here, this is incorrect. * **Option C (0.27):** This is likely a distractor calculated by dividing the weight by 10. This represents a misunderstanding of the SD formula. * **Option D (2.7):** This value is the Mean, not the SD. SD measures the *deviation* from the mean, not the magnitude of the mean itself. **3. High-Yield Clinical Pearls for NEET-PG:** * **Measures of Dispersion:** Range, Mean Deviation, Standard Deviation, and Coefficient of Variation. * **Standard Deviation (SD):** Also called the "Root Mean Square Deviation." It is the most commonly used measure of dispersion in medical statistics. * **Coefficient of Variation (CV):** Used to compare relative variability between two groups with different units. $CV = (SD / Mean) \times 100$. In this question, the CV would also be 0%. * **Normal Distribution:** In a normal (Gaussian) distribution: * Mean ± 1 SD covers **68.3%** of values. * Mean ± 2 SD covers **95.4%** of values. * Mean ± 3 SD covers **99.7%** of values.

Q: The prevalence of diabetes mellitus in a population was found to be 10 percent. What is the probability that three people selected at random from the population have the disease?

0.001. ### Explanation **1. Understanding the Correct Answer (B: 0.001)** This question tests the application of the **Multiplication Rule of Probability** for independent events. * **Prevalence (P):** 10% or 0.1. This represents the probability that any single individual selected at random has diabetes. * **Independence:** Since the individuals are selected at random from a large population, the health status of one person does not influence the status of the next. * **Calculation:** To find the probability of multiple independent events occurring together (Person 1 AND Person 2 AND Person 3), we multiply their individual probabilities: * $0.1 \times 0.1 \times 0.1 = 0.001$ (or $10^{-3}$). **2. Analysis of Incorrect Options** * **Option A (0.003):** This is a common error where the student **adds** the probabilities ($0.1 + 0.1 + 0.1 = 0.3$) and misplaces the decimal, or confuses the multiplication rule with simple addition. * **Option C (0.03):** This represents $0.1 \times 0.3$, which has no mathematical basis in this scenario. * **Option D (0.01):** This is the probability of only **two** people having the disease ($0.1 \times 0.1$). **3. High-Yield Clinical Pearls for NEET-PG** * **Addition Rule:** Used for "Either/Or" scenarios (mutually exclusive events). E.g., Probability of being Blood Group A OR B. * **Multiplication Rule:** Used for "And" scenarios (independent events). E.g., Probability of two siblings both having an autosomal recessive disorder ($1/4 \times 1/4 = 1/16$). * **Prevalence vs. Incidence:** Remember that Prevalence = Incidence × Mean Duration of disease ($P = I \times D$). In this question, prevalence is used as the "pre-test probability" for random selection. * **Complementary Probability:** The probability of a person *not* having diabetes is $1 - 0.1 = 0.9$. The probability that *none* of the three have the disease would be $0.9^3 = 0.729$.

Q: Confidence limits are calculated using which of the following?

Mean and standard deviation. **Explanation:** Confidence limits (or Confidence Intervals) are a range of values within which the true population parameter is expected to lie with a specified degree of certainty (usually 95%). **Why Option B is Correct:** In biostatistics, the calculation of confidence limits for a normally distributed variable is based on the **Mean** and the **Standard Deviation (SD)**. The standard formula for a 95% Confidence Interval is: **[Mean ± (1.96 × SE)]** Since the **Standard Error (SE)** is derived directly from the Standard Deviation ($SE = SD / \sqrt{n}$), the confidence limits are fundamentally dependent on the Mean (measure of central tendency) and the Standard Deviation (measure of dispersion). **Why Other Options are Incorrect:** * **Option A:** While the Standard Error is used in the final step of the formula, the primary parameters defining the distribution's spread and center are the Mean and SD. In many NEET-PG contexts, SD is emphasized as the core measure of variability required. * **Option C & D:** The **Median** is a measure of central tendency used for skewed (non-normal) data. Confidence limits for a population mean specifically require the Mean, as they assume a normal distribution (Gaussian curve). **High-Yield Clinical Pearls for NEET-PG:** * **95% Confidence Interval:** Corresponds to Mean ± 1.96 SE (often rounded to 2 SE for quick calculations). * **99% Confidence Interval:** Corresponds to Mean ± 2.58 SE. * **Interpretation:** If a 95% CI for a Relative Risk or Odds Ratio includes **1**, the results are not statistically significant ($p > 0.05$). * **Precision:** A narrower confidence interval indicates a larger sample size and greater precision.

Q: Quantitative data can be compared using which of the following graphical methods?

Histogram. ### Explanation **Correct Answer: B. Histogram** **Why it is correct:** In biostatistics, data is primarily classified into **Qualitative (Categorical)** and **Quantitative (Numerical)**. A **Histogram** is the most common graphical method used to represent continuous quantitative data. It consists of a series of rectangles where the area represents the frequency of the variable. Unlike bar charts, there are no gaps between the rectangles because the data is continuous (e.g., height, weight, hemoglobin levels). **Analysis of Incorrect Options:** * **A. Pictograph:** This uses images or symbols to represent data. It is a popular way to present data to the general public but is used for discrete categories (Qualitative), not for comparing complex quantitative distributions. * **C. Pie Chart:** This is used to show the proportional segments of a whole. It is used for **Qualitative/Nominal data** (e.g., distribution of causes of death in a hospital). It does not show the frequency distribution of continuous variables. * **D. Spot Map:** This is a geographical distribution method used in epidemiology to show the **local unit of area** (e.g., cases of cholera in a city). It identifies clusters but does not compare quantitative measurements. **High-Yield Clinical Pearls for NEET-PG:** * **Quantitative Data Graphs:** Histogram, Frequency Polygon, Line Diagram, Cumulative Frequency Diagram (Ogive), and Scatter Diagram. * **Qualitative Data Graphs:** Bar Chart (Simple, Multiple, Component), Pie Chart, Pictogram, and Map Diagram (Spot Map). * **Frequency Polygon:** Created by joining the midpoints of the tops of the bars in a histogram; it is used to compare two or more frequency distributions on the same graph. * **Scatter Diagram:** Used to show the **correlation** between two quantitative variables.

Q: Calculate the median from the following values: 1.9, 1.9, 1.9, 2.1, 2.4, 2.5, 2.5, 2.9?

2.25. ### Explanation **1. Understanding the Correct Answer (C: 2.25)** The **Median** is the middle-most value in a data set when arranged in ascending or descending order. It is a measure of central tendency that is less affected by extreme values (outliers) compared to the Mean. To calculate the median: * **Step 1: Arrange the data.** The values are already provided in ascending order: 1.9, 1.9, 1.9, 2.1, 2.4, 2.5, 2.5, 2.9. * **Step 2: Count the number of observations ($n$).** Here, $n = 8$. * **Step 3: Apply the formula.** Since $n$ is **even**, the median is the average of the two middle terms ($\frac{n}{2}$ and $\frac{n}{2} + 1$). * The 4th term is **2.1**. * The 5th term is **2.4**. * $\text{Median} = \frac{2.1 + 2.4}{2} = \frac{4.5}{2} = \mathbf{2.25}$. **2. Analysis of Incorrect Options** * **A (1.2):** This value is not present in the data set and lacks mathematical relevance to the calculation. * **B (1.9):** This is the **Mode** (the most frequently occurring value), not the median. * **D (2.5):** This is the 6th and 7th term; selecting this ignores the rule for calculating the average of the two central points in an even-numbered data set. **3. Clinical Pearls & High-Yield Facts for NEET-PG** * **Best Measure of Central Tendency:** For **skewed data** (e.g., incubation periods, income), the **Median** is the most appropriate measure. * **Nominal Data:** Use **Mode**. * **Ordinal Data:** Use **Median**. * **Interval/Ratio Data (Normal Distribution):** Use **Mean**. * **Relationship in Positive Skew:** Mean > Median > Mode. * **Relationship in Negative Skew:** Mode > Median > Mean. * **Note:** The Median corresponds to the **50th Percentile** and the **2nd Quartile ($Q_2$)**.

Q: Which of the following is a prerequisite for the Chi-square test?

Both samples should be mutually exclusive. The **Chi-square ($\chi^2$) test** is a non-parametric test used to determine if there is a significant association between two categorical variables. ### **Explanation of the Correct Answer** **Option A (Both samples should be mutually exclusive)** is a fundamental prerequisite. For a Chi-square test to be valid, each subject or observation must fall into **one and only one category**. If an individual could belong to both groups (e.g., being in both the "Treatment" and "Control" group simultaneously), the assumption of independence is violated, leading to an overestimation of statistical significance. ### **Analysis of Incorrect Options** * **Option B:** This is the opposite of the requirement. If samples are not mutually exclusive, the data points are dependent, and a different test (like **McNemar’s test** for paired categorical data) must be used. * **Option C (Normal distribution):** Chi-square is a **non-parametric test**, meaning it does not require the data to follow a normal (Gaussian) distribution. It is used for nominal or ordinal data, unlike the t-test or ANOVA, which require normality. ### **High-Yield Clinical Pearls for NEET-PG** * **Qualitative Data:** Chi-square is the most common test for qualitative (categorical) data (e.g., Gender vs. Disease status). * **Yates’ Correction:** Applied when the sample size is small or any cell value in a 2x2 table is **less than 5**. * **Degrees of Freedom (df):** For a contingency table, $df = (r-1) \times (c-1)$. For a 2x2 table, $df = 1$. * **Null Hypothesis ($H_0$):** In Chi-square, the $H_0$ assumes there is **no association** between the variables. * **Test of Goodness of Fit:** Chi-square can also be used to see if observed data fits an expected theoretical distribution.

Q: Effective literacy rate is calculated from which age group?

Those above 7 years of age. ### Explanation **Correct Option: A (Those above 7 years of age)** In the context of Indian Census and Biostatistics, the **Effective Literacy Rate** is defined as the percentage of the population aged **7 years and above** who can both read and write with understanding in any language. The underlying concept is that children below the age of 7 are generally in the early stages of primary education and may not have acquired stable literacy skills. Including them in the denominator would artificially deflate the literacy statistics of a developing nation. Therefore, the "effective" rate focuses only on the population segment that has had the opportunity to achieve functional literacy. **Analysis of Incorrect Options:** * **Option B & C (10 or 15 years of schooling):** These are measures of educational attainment or "Mean Years of Schooling," not basic literacy. Literacy is defined by the ability to read and write, regardless of formal schooling completed. * **Option D (Total population):** This is used to calculate the **Crude Literacy Rate**. While it provides a general overview, it is considered less accurate than the effective rate because it includes infants and toddlers who are biologically incapable of being literate. **High-Yield Facts for NEET-PG:** * **Crude Literacy Rate:** (Number of literate persons / Total population) × 100. * **Effective Literacy Rate:** (Number of literate persons aged 7+ / Population aged 7+) × 100. * **Census 2011 Data:** The overall effective literacy rate in India was **74.04%** (Males: 82.14%, Females: 65.46%). * **Highest Literacy:** Kerala consistently ranks highest among Indian states. * **Global Standard:** While India uses age 7, many international organizations (like UNESCO) often use age 15+ for adult literacy statistics; however, for Indian exams, **7 years** is the standard benchmark.

Q: A screening test was positive in 50% of the diseased population and 10% of the healthy population. What is the specificity of the test?

0.9. ### **Explanation** To solve this question, we must apply the definitions of diagnostic test parameters to the percentages provided. 1. **Understanding the Data:** * **Sensitivity:** The probability that the test is positive in a diseased person. The question states the test is positive in **50%** of the diseased population. Thus, Sensitivity = 0.50. * **False Positive Rate (α):** The probability that the test is positive in a healthy person. The question states the test is positive in **10%** of the healthy population. Thus, False Positive Rate = 0.10. 2. **Calculating Specificity:** Specificity is the ability of a test to correctly identify those without the disease (True Negative Rate). It is mathematically related to the False Positive Rate: * **Specificity = 1 – False Positive Rate** * Specificity = 1 – 0.10 = **0.9 (or 90%)** --- ### **Analysis of Options:** * **Option B (0.9) is Correct:** As calculated above, specificity is the complement of the positive rate in healthy individuals (100% - 10% = 90%). * **Option A (0.5) is Incorrect:** This represents the **Sensitivity** of the test (positive results in the diseased population). * **Option C (0.83) is Incorrect:** This value does not correlate with the provided data; it is often a distractor representing a calculated Positive Predictive Value in specific prevalence scenarios. * **Option D (0.064) is Incorrect:** This is a mathematical distractor with no relevance to the basic definitions of sensitivity or specificity. --- ### **High-Yield Clinical Pearls for NEET-PG:** * **SNOUT:** **S**ensitivity helps rule **OUT** a disease (used in screening). * **SPIN:** **S**pecificity helps rule **IN** a disease (used in confirmation). * **False Positive Rate** is also known as **Type I Error (α)**. * **False Negative Rate** is also known as **Type II Error (β)**; Sensitivity is calculated as **(1 – β)**.

Question 1

The birth weight of all 10 babies born in a nursing home on a day was 2.7 kg. What is the standard deviation?

Accepted Answer

0

Answer

1

Answer

0.27

Answer

2.7

Question 2

The prevalence of diabetes mellitus in a population was found to be 10 percent. What is the probability that three people selected at random from the population have the disease?

Accepted Answer

0.001

Answer

0.003

Answer

0.03

Answer

0.01

Question 3

Confidence limits are calculated using which of the following?

Accepted Answer

Mean and standard deviation

Answer

Mean and standard error

Answer

Median and standard deviation

Answer

Median

Question 4

When a patient attending the OPD is asked to evaluate their pain on a scale of 0 (no pain) to 5 (the worst pain), this commonly applied scale is best described as:

Accepted Answer

Ratio scale

Answer

Dichotomous scale

Answer

Continuous scale

Answer

Nominal scale

Question 5

Quantitative data can be compared using which of the following graphical methods?

Accepted Answer

Histogram

Answer

Pictograph

Answer

Pie chart

Answer

Spot map

Question 6

Calculate the median from the following values: 1.9, 1.9, 1.9, 2.1, 2.4, 2.5, 2.5, 2.9?

Accepted Answer

2.25

Answer

1.2

Answer

1.9

Answer

2.5

Question 7

Which of the following is a prerequisite for the Chi-square test?

Accepted Answer

Both samples should be mutually exclusive

Answer

Both samples need not be mutually exclusive

Answer

Normal distribution

Answer

All of the above

Question 8

The age and sex structure of a population may be best described by which of the following?

Accepted Answer

Population pyramid

Answer

Life table

Answer

Correlation coefficient

Answer

Bar chart

Question 9

Effective literacy rate is calculated from which age group?

Accepted Answer

Those above 7 years of age

Answer

Those who have completed 10 years of schooling

Answer

Those who have completed 15 years of schooling

Answer

Total population

Question 10

A screening test was positive in 50% of the diseased population and 10% of the healthy population. What is the specificity of the test?

Accepted Answer

0.9

Answer

0.5

Answer

0.83

Answer

0.064

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?