Biostatistics Practice Questions

Q: Which of the following is NOT true about the Sample Registration System?

It is the same as the census.. **Explanation:** The **Sample Registration System (SRS)** is a large-scale demographic survey in India designed to provide reliable annual estimates of vital statistics. **1. Why Option A is the Correct Answer (The "NOT True" statement):** The SRS is **not** the same as the Census. The **Census** is a decennial (every 10 years) exercise that covers the entire population (complete enumeration). In contrast, the **SRS is a sample-based survey** that covers only a representative part of the population to estimate vital rates for the periods between two censuses. **2. Analysis of Other Options:** * **Option B (Dual Record System):** This is true. The SRS uses two independent methods to collect data: continuous enumeration by a resident part-time enumerator and an independent retrospective half-yearly survey by a full-time supervisor. * **Option C (Reliable estimates):** This is true. Because the Civil Registration System (CRS) in India has historically faced under-reporting, the SRS serves as the primary and most reliable source for Birth Rate, Death Rate, and Infant Mortality Rate (IMR) at the national and state levels. * **Option D (Retrospective half-yearly system):** This is true. As part of the dual-record system, a supervisor conducts an independent check every six months to record events that occurred during the previous half-year. **High-Yield NEET-PG Pearls:** * **Origin:** SRS was initiated on a pilot basis in 1964-65 and became fully operational in 1969-70. * **Authority:** It is conducted by the **Office of the Registrar General of India (RGI)**, Ministry of Home Affairs. * **Key Utility:** It is the "Gold Standard" for measuring **IMR (Infant Mortality Rate)** and **MMR (Maternal Mortality Ratio)** in India. * **Comparison:** While Census provides data on population size/distribution, SRS provides data on population dynamics (fertility and mortality).

Q: What is the typical correlation coefficient observed between Infant Mortality Rate (IMR) and Socioeconomic Status?

Negative 0.8. ### Explanation **1. Why Option D is Correct:** The correlation between Infant Mortality Rate (IMR) and Socioeconomic Status (SES) is **negative (inverse)**. As the socioeconomic status of a population improves (better nutrition, sanitation, and healthcare access), the IMR decreases. In biostatistics, a correlation coefficient ($r$) of **-0.8** indicates a **strong negative correlation**. This reflects the real-world observation that IMR is one of the most sensitive indicators of a nation’s socioeconomic development and health equity. **2. Analysis of Incorrect Options:** * **Option A (Positive 1):** This implies a perfect direct relationship where IMR increases as SES increases, which is factually incorrect. * **Option B (Positive 0.5):** This suggests a moderate positive relationship. In reality, wealth and health outcomes move in opposite directions. * **Option C (Negative 1):** While the direction is correct, a correlation of -1 represents a **perfect** linear relationship. In public health, biological and environmental variables rarely follow a perfect line due to confounding factors (e.g., genetic predispositions or sudden natural disasters), making -0.8 a more realistic "typical" observation. **3. NEET-PG High-Yield Pearls:** * **IMR Definition:** Number of infant deaths (under 1 year) per 1,000 live births. * **Sensitivity:** IMR is considered the **best single indicator** of the health status of a community and its socioeconomic development. * **Correlation Coefficient ($r$):** Ranges from -1 to +1. * $r = 0$: No linear correlation. * $r = 1$: Perfect positive correlation. * $r = -1$: Perfect negative correlation. * **P-value vs. $r$:** Remember that $r$ tells you the **strength/direction** of the relationship, while the p-value tells you the **statistical significance**.

Q: For an epidemiological study, every 10th person is selected from a population. What is this type of sampling known as?

Systematic random sampling. ### Explanation **1. Why Systematic Random Sampling is Correct:** Systematic random sampling is a probability sampling method where individuals are selected at regular intervals (the **sampling interval, 'k'**) from a sampling frame. * **The Process:** You calculate the interval $k = N/n$ (where $N$ is the total population and $n$ is the sample size). A starting point is chosen randomly between 1 and $k$, and then every $k^{th}$ person is selected. * In this question, selecting "every 10th person" represents a fixed interval ($k=10$), which is the hallmark of systematic sampling. **2. Why the Other Options are Incorrect:** * **Simple Random Sampling:** Every individual in the population has an equal and independent chance of being selected (e.g., lottery method or computer-generated random numbers). It does not follow a fixed numerical sequence. * **Stratified Random Sampling:** The population is first divided into homogenous sub-groups (**strata**) based on characteristics (e.g., age, gender, SES). Samples are then drawn from each stratum. This is used when the population is heterogeneous. * **Cluster Random Sampling:** The population is divided into naturally occurring groups called **clusters** (e.g., villages, schools). Instead of selecting individuals, entire clusters are selected randomly. This is the method used in the WHO EPI coverage surveys (30-cluster sampling). **3. High-Yield Pearls for NEET-PG:** * **Sampling Interval ($k$):** Total Population ($N$) / Sample Size ($n$). * **Systematic Sampling:** Often called "Quasi-random" because once the first unit is picked, the rest of the sample is automatically determined. * **Multistage Sampling:** Used in large-scale national surveys (like NFHS); it involves multiple levels of sampling (e.g., Districts → Villages → Households). * **Snowball Sampling:** A non-probability method used for "hidden populations" like IV drug users or commercial sex workers.

Q: For a Randomized Control Trial (RCT) to assess dating in adolescents, a study was conducted by selecting random schools, then random classes, then random sections, and finally random students. This is an example of:

Multistage sampling. ### Explanation **Correct Answer: D. Multistage Sampling** **Why it is correct:** Multistage sampling is a complex form of probability sampling where the sample is selected in several stages using smaller and smaller sampling units at each stage. In this scenario, the researcher follows a hierarchical progression: **Schools (1st stage) → Classes (2nd stage) → Sections (3rd stage) → Students (Final stage)**. Unlike cluster sampling, where all elements within a selected group are studied, multistage sampling involves further random selection at every level until the final individual unit is reached. **Why the other options are incorrect:** * **A. Stratified Sampling:** This involves dividing a heterogeneous population into homogeneous groups (strata) based on a specific characteristic (e.g., age, gender) and then taking a random sample from *each* stratum. Here, the selection is based on hierarchy, not specific population traits. * **B. Simple Random Sampling:** This is the "lottery method" where every individual in the entire population has an equal chance of being selected. It is impractical for large, geographically dispersed populations like all school students. * **C. Cluster Sampling:** In this method, the population is divided into groups (clusters), a few clusters are randomly selected, and *everyone* within those selected clusters is studied. If the researcher had studied every student in the selected sections, it would be cluster sampling. **High-Yield Clinical Pearls for NEET-PG:** * **Multistage Sampling** is the most common method used in large-scale national health surveys (e.g., NFHS in India). * **Cluster Sampling** is the method of choice for the **WHO Expanded Programme on Immunization (EPI)** coverage surveys (30 clusters × 7 children). * **Precision:** Simple Random Sampling usually has the highest precision, while Multistage/Cluster sampling has lower precision but higher feasibility for field research.

Q: All of the following statistical tests are used to analyze variables with normal distribution except?

Chi square test. **Explanation** The core concept tested here is the distinction between **Parametric** and **Non-parametric** tests. * **Parametric tests** (Options A, C, and D) assume that the data follows a **Normal (Gaussian) Distribution** and are used for quantitative (numerical) data. * **Non-parametric tests** (Option B) make no assumptions about the distribution of the data (distribution-free) and are primarily used for qualitative (categorical) data. **Why Chi-square test is the correct answer:** The Chi-square test is a non-parametric test used to compare proportions and associations between **categorical/nominal variables** (e.g., gender, smoking status). Since it does not require the data to follow a normal distribution, it is the "except" in this list. **Analysis of incorrect options:** * **Student’s t-test:** A parametric test used to compare the means of two groups. It requires the data to be normally distributed. * **ANOVA (Analysis of Variance):** An extension of the t-test used to compare the means of three or more groups. It also assumes a normal distribution. * **Multiple Linear Regression:** A parametric method used to model the relationship between one dependent variable and multiple independent variables. It assumes that the residuals (errors) are normally distributed. **High-Yield Clinical Pearls for NEET-PG:** 1. **Quantitative Data + 2 groups:** Use Student’s t-test (Unpaired for independent groups, Paired for before-after studies). 2. **Quantitative Data + >2 groups:** Use ANOVA. 3. **Qualitative Data:** Use Chi-square test (or Fisher’s Exact test if the sample size is very small/cell frequency <5). 4. **Non-parametric alternatives:** If data is not normally distributed, use **Mann-Whitney U test** (instead of unpaired t-test) or **Kruskal-Wallis test** (instead of ANOVA).

Q: What value of the correlation coefficient "r" indicates a high correlation between two variables?

1. **Explanation:** The **Correlation Coefficient (r)**, also known as Pearson’s product-moment correlation, measures the strength and direction of a linear relationship between two quantitative variables. Its value ranges from **-1 to +1**. **Why Option C is Correct:** A value of **+1** indicates a **perfect positive correlation**. This means that as one variable increases, the other increases in a perfectly predictable linear fashion. In biostatistics, the closer the value of "r" is to 1 (or -1), the stronger or "higher" the correlation. Therefore, 1 represents the maximum possible strength of a positive relationship. **Analysis of Incorrect Options:** * **Option A (0):** Indicates **zero correlation** or no linear relationship between the variables. * **Option B (0.5):** Indicates a **moderate positive correlation**. While there is a relationship, it is not considered "high" or "strong" (usually r > 0.7 is required for a strong correlation). * **Option D (-1):** Indicates a **perfect negative correlation**. While this is technically as "strong" as +1, in the context of standard MCQ phrasing, "high correlation" typically refers to the magnitude approaching the positive maximum unless "inverse correlation" is specified. However, if the question asks for the *strongest* correlation and both 1 and -0.9 are present, the value furthest from zero is the strongest. **Clinical Pearls for NEET-PG:** * **Range:** -1 ≤ r ≤ +1. * **Coefficient of Determination (r²):** This is the square of the correlation coefficient. it represents the proportion of variance in one variable explained by the other (e.g., if r = 0.6, then r² = 0.36 or 36%). * **Direction vs. Strength:** The sign (+/-) indicates direction; the numerical value indicates strength. * **Limitation:** Correlation does **not** imply causation. It only measures linear association.

Q: In a test of significance, if the P value is 0.023, what can be concluded about the observed difference in the study?

The null hypothesis is rejected, and the study is accepted.. ### Explanation **1. Understanding the Correct Answer (Option B)** In biostatistics, the **P-value** represents the probability that the observed difference occurred by chance. By convention, the threshold for statistical significance (alpha level) is set at **0.05**. * If **P < 0.05**: The result is "statistically significant." We **reject the Null Hypothesis ($H_0$)** (which claims there is no difference) and **accept the Alternative Hypothesis ($H_1$)**. * In this case, $P = 0.023$ (which is $ 0.05$ (e.g., $P = 0.23$), indicating the results are likely due to chance. * **Option D:** This is logically inconsistent. If you reject the null hypothesis, you are essentially validating that the study has found a significant effect/difference, so the study results are "accepted" as statistically valid. **3. High-Yield Clinical Pearls for NEET-PG** * **P-value vs. Alpha:** $P$ is the calculated probability from the data; $\alpha$ (usually 0.05) is the pre-determined cutoff. * **Type I Error ($\alpha$):** Rejecting a null hypothesis that is actually true (False Positive). The P-value is the probability of committing a Type I error. * **Type II Error ($\beta$):** Failing to reject a null hypothesis that is actually false (False Negative). * **Power of Study ($1-\beta$):** The ability of a study to detect a difference if one truly exists. * **Significant vs. Highly Significant:** $P < 0.05$ is significant; $P < 0.01$ is often termed "highly significant."

Q: Which of the following formulas represents a bimodal distribution?

Mode = 3 median - 2 mean. ### Explanation **1. Why Option A is Correct:** The formula **Mode = 3 Median – 2 Mean** is known as **Karl Pearson’s Empirical Relationship**. In a perfectly symmetrical (normal) distribution, the mean, median, and mode are all equal. However, in moderately asymmetrical (skewed) distributions—often encountered in biological data—this mathematical relationship allows us to estimate the value of one measure if the other two are known. While the question mentions "bimodal," this formula is specifically used to calculate the **"Empirical Mode"** in distributions where a clear single peak is difficult to identify or when the distribution is slightly skewed. In the context of NEET-PG, this is the standard gold-standard formula for relating the three measures of central tendency. **2. Why Other Options are Incorrect:** * **Options B and C:** These are mathematically incorrect variations. Adding the mean and median or using a factor of "2" for the median does not satisfy the geometric properties of a skewed frequency curve. * **Option D:** This is a tautology (using mode to define mode) and is mathematically invalid. **3. Clinical Pearls & High-Yield Facts for NEET-PG:** * **Normal Distribution:** Mean = Median = Mode (Bell-shaped curve). * **Positive Skew (Right-tailed):** Mean > Median > Mode (e.g., income distribution, incubation periods). * **Negative Skew (Left-tailed):** Mode > Median > Mean (e.g., age of death in developed countries). * **Median's Advantage:** It is the best measure of central tendency for **skewed data** because it is not affected by extreme values (outliers). * **Bimodal Distribution:** Occurs when there are two peaks in the data (e.g., Hodgkin’s lymphoma age incidence or body temperature in certain relapsing fevers). If a distribution is strictly bimodal, the empirical formula may only provide an approximation.

Q: In a community, 20% of the population is below 15 years of age and 15% is above 65 years of age. Calculate the dependency ratio.

54%. ### Explanation **1. Understanding the Correct Answer (C: 54%)** The **Dependency Ratio** is a demographic measure used to understand the economic burden on the productive portion of a population. It is defined as the ratio of the "dependent" population (those not typically in the labor force) to the "working-age" population. * **Formula:** $$\text{Dependency Ratio} = \frac{(\text{Population } 65 \text{ years})}{\text{Population between 15–64 years}} \times 100$$ * **Calculation for this question:** * Young dependents ( 65 years) = 15% * Total dependents = 20 + 15 = 35% * Working-age population = 100% – (Total dependents) = 100 – 35 = 65% * **Dependency Ratio** = $(35 / 65) \times 100 = \mathbf{53.84\%}$ (rounded to **54%**). **2. Why Other Options are Incorrect** * **Option A (34%) & B (40%):** These are mathematical miscalculations or represent only one segment of the dependency (e.g., just the young or old) without dividing by the working-age denominator. * **Option D (85%):** This likely results from incorrectly using the total population (100) as the denominator or misidentifying the working-age group. **3. NEET-PG High-Yield Pearls** * **Total Dependency Ratio:** Sum of young and old dependency. * **Young Age Dependency Ratio:** $(\text{Pop } 65 / \text{Pop } 15\text{–}64) \times 100$. * **Demographic Dividend:** Occurs when the dependency ratio declines due to a bulge in the working-age population (15–64 years), leading to potential economic growth. * **Note:** In some Indian contexts, the working age is occasionally cited as 15–59 years; however, for standard international biostatistics and most NEET-PG questions, **15–64 years** is the gold standard denominator.

Q: In a village population, individuals are arranged alphabetically, and then every 8th person is selected for the study. What type of study design is this?

Systematic random sampling. ### Explanation **1. Why Systematic Random Sampling is Correct:** Systematic random sampling is a probability sampling method where the sample is chosen based on a fixed, periodic interval (the **sampling interval, 'k'**). * **The Process:** First, a list of the population is created (the sampling frame). Then, a starting point is chosen at random, and every $k^{th}$ individual is selected thereafter. * **In this question:** The alphabetical arrangement provides the sampling frame, and selecting every **8th person** represents the sampling interval ($k=8$). This "regular interval" approach is the hallmark of systematic sampling. **2. Why the Other Options are Incorrect:** * **Simple Random Sampling:** Every individual has an equal and independent chance of being selected (e.g., lottery method or random number table). It does not follow a fixed numerical pattern like "every 8th person." * **Stratified Random Sampling:** The population is first divided into homogenous subgroups (strata) based on characteristics like age, sex, or income, and then samples are drawn from each stratum. No such grouping occurred here. * **Cluster Sampling:** Used when the population is large and spread out. The population is divided into "clusters" (e.g., city blocks or villages), and entire clusters are selected at random. Here, individuals are being selected, not groups. **3. High-Yield Clinical Pearls for NEET-PG:** * **Sampling Interval ($k$):** Calculated as $N/n$ (Total Population / Sample Size). * **Advantage:** It is simpler and more convenient than simple random sampling and ensures even spread across the list. * **Potential Bias:** If the list has a hidden periodic pattern that coincides with the sampling interval (periodicity), the sample may not be representative. * **Comparison:** Systematic sampling is often called **"Quasi-random"** because only the first unit is selected truly at random.

Question 1

Which of the following is NOT true about the Sample Registration System?

Accepted Answer

It is the same as the census.

Answer

It is a dual record system.

Answer

It provides a reliable estimate of birth rate and death rate.

Answer

It is an independent, retrospective, half-yearly system.

Question 2

What is the typical correlation coefficient observed between Infant Mortality Rate (IMR) and Socioeconomic Status?

Accepted Answer

Negative 0.8

Answer

Positive 1

Answer

Positive 0.5

Answer

Negative 1

Question 3

For an epidemiological study, every 10th person is selected from a population. What is this type of sampling known as?

Accepted Answer

Systematic random sampling

Answer

Simple random sampling

Answer

Stratified random sampling

Answer

Cluster random sampling

Question 4

For a Randomized Control Trial (RCT) to assess dating in adolescents, a study was conducted by selecting random schools, then random classes, then random sections, and finally random students. This is an example of:

Accepted Answer

Multistage sampling

Answer

Stratified sampling

Answer

Simple random sampling

Answer

Cluster sampling

Question 5

All of the following statistical tests are used to analyze variables with normal distribution except?

Accepted Answer

Chi square test

Answer

Student t-test

Answer

ANOVA

Answer

Multiple linear regression

Question 6

What value of the correlation coefficient "r" indicates a high correlation between two variables?

Accepted Answer

1

Answer

0

Answer

0.5

Answer

-1

Question 7

In a test of significance, if the P value is 0.023, what can be concluded about the observed difference in the study?

Accepted Answer

The null hypothesis is rejected, and the study is accepted.

Answer

The null hypothesis is accepted, and the study is rejected.

Answer

The null hypothesis is accepted, and the study is accepted.

Answer

The null hypothesis is rejected, and the study is also rejected.

Question 8

Which of the following formulas represents a bimodal distribution?

Accepted Answer

Mode = 3 median - 2 mean

Answer

Mode = 3 median + 2 mean

Answer

Mode = 2 median + 2 mean

Answer

Mode = 3 mode - 3 mean

Question 9

In a community, 20% of the population is below 15 years of age and 15% is above 65 years of age. Calculate the dependency ratio.

Accepted Answer

54%

Answer

34%

Answer

40%

Answer

85%

Question 10

In a village population, individuals are arranged alphabetically, and then every 8th person is selected for the study. What type of study design is this?

Accepted Answer

Systematic random sampling

Answer

Simple random sampling

Answer

Stratified random sampling

Answer

Cluster sampling

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?