Biostatistics Practice Questions

Q: The median weight of 100 children was 16 kgs. The standard deviation was 8. Calculate the percentage coefficient of variance.

50%. ### Explanation **1. Why Option A is Correct** The **Coefficient of Variation (CV)** is a measure of relative dispersion that expresses the standard deviation as a percentage of the mean. The formula is: $$\text{CV} = \left( \frac{\text{Standard Deviation}}{\text{Mean}} \right) \times 100$$ In a **Normal (Gaussian) Distribution**, the mean, median, and mode are equal. For the purpose of NEET-PG calculations, if the mean is not explicitly provided but the median is given for a large sample (n=100), we use the median as the best estimate for the mean. * **Standard Deviation (SD):** 8 * **Mean (Median):** 16 * **Calculation:** $(8 / 16) \times 100 = 0.5 \times 100 = \mathbf{50\%}$. **2. Why Other Options are Wrong** * **Options B, C, and D (35%, 45%, 55%):** These values are mathematically incorrect based on the provided data. They would only be correct if the SD was 5.6, 7.2, or 8.8 respectively, assuming a mean of 16. **3. Clinical Pearls & High-Yield Facts** * **Unitless Measure:** Unlike Standard Deviation, CV has no units. This makes it the gold standard for comparing the variability of two different datasets (e.g., comparing the variability of height in cm vs. weight in kg). * **Normal Distribution Properties:** In a perfectly normal distribution, Mean = Median = Mode. * **Standard Error vs. SD:** Do not confuse CV with Standard Error (SE). $SE = SD / \sqrt{n}$. SE measures the precision of the sample mean, while CV measures the relative spread. * **Rule of Thumb:** A higher CV indicates greater dispersion/volatility relative to the mean, while a lower CV indicates higher consistency.

Q: In a cohort study of 7000 smokers over ten years, 70 developed lung cancer. In a concurrent evaluation of 7000 non-smokers in the same catchment area, 7 developed lung cancer. What is the Relative Risk (RR) for developing lung cancer?

10. ### Explanation **1. Why Option B (10) is Correct:** Relative Risk (RR) is the ratio of the incidence of a disease among an exposed group to the incidence among a non-exposed group. It is the primary measure of association in **Cohort Studies**. * **Incidence in Exposed (Smokers):** $I_e = \frac{\text{New cases}}{\text{Total exposed}} = \frac{70}{7000} = 0.01$ (or 10 per 1000) * **Incidence in Non-exposed (Non-smokers):** $I_o = \frac{\text{New cases}}{\text{Total non-exposed}} = \frac{7}{7000} = 0.001$ (or 1 per 1000) * **Formula for Relative Risk (RR):** $\frac{I_e}{I_o} = \frac{0.01}{0.001} = \mathbf{10}$ This means smokers are 10 times more likely to develop lung cancer compared to non-smokers. **2. Why Other Options are Incorrect:** * **Option A (1):** An RR of 1 indicates "Null Hypothesis" (no association between exposure and disease). * **Option C (100):** This would imply a much higher strength of association, likely due to a calculation error in decimal placement. * **Option D (0.1):** An RR < 1 indicates a "Protective Effect" (the exposure prevents the disease), which is clinically incorrect for smoking and cancer. **3. High-Yield Clinical Pearls for NEET-PG:** * **Relative Risk (RR):** Direct measure of the **strength of association**. It is calculated only in prospective studies (Cohort). * **Odds Ratio (OR):** Used in Case-Control studies as an estimate of RR. * **Attributable Risk (AR):** $(I_e - I_o) / I_e \times 100$. It indicates the amount of disease that can be prevented if the exposure is eliminated. * **Population Attributable Risk (PAR):** Useful for public health administrators to prioritize interventions in the community.

Q: The "Crude death rate" is defined as the number of deaths (from all causes) per 1000 estimated what?

Mid-year population. **Explanation:** **1. Why "Mid-year population" is correct:** The Crude Death Rate (CDR) is a fundamental measure of mortality in a population. It is calculated as the number of deaths occurring during a calendar year per 1000 of the **mid-year population**. The mid-year population (estimated as of July 1st) is used as the denominator because it represents the "average" population at risk of dying throughout that year, accounting for births, deaths, and migrations that occur during the 12-month period. **2. Why other options are incorrect:** * **Total population:** While CDR relates to the population, "Total population" is vague. In demography, the population size fluctuates daily; therefore, the specific mid-year estimate is the standardized denominator used for annual rates. * **Total births / Live births:** These are used as denominators for mortality indicators specifically related to early life, such as the **Infant Mortality Rate (IMR)** or **Maternal Mortality Ratio (MMR)**, rather than the general death rate of the entire community. **3. NEET-PG High-Yield Pearls:** * **Formula:** $CDR = \frac{\text{Number of deaths during the year}}{\text{Mid-year population}} \times 1000$. * **Limitation:** The CDR is "crude" because it does not account for the age and sex composition of the population. A population with many elderly individuals will have a higher CDR than a younger population, even if health conditions are better. * **Comparison:** To compare mortality between two different populations (e.g., Kerala vs. UP), **Age-Standardized Death Rates** are the preferred indicator to eliminate the bias of age distribution. * **Current Trend:** According to recent SRS (Sample Registration System) data, the CDR for India is approximately **6.0 per 1000** mid-year population.

Q: Type 1 Sampling error is classified as?

Alpha error. In biostatistics, hypothesis testing involves making a decision about a population based on sample data. Errors occur when this decision does not reflect the true state of the population. ### **Explanation of the Correct Answer** **A. Alpha (α) Error (Type I Error):** This occurs when a researcher **rejects a null hypothesis that is actually true**. In clinical terms, it is a "False Positive" result—concluding that a treatment works or a difference exists when, in reality, it does not. The probability of committing a Type I error is denoted by the significance level (α), commonly set at 0.05 (5%). ### **Explanation of Incorrect Options** * **B. Beta (β) Error (Type II Error):** This occurs when a researcher **fails to reject a null hypothesis that is actually false**. It is a "False Negative" result—concluding there is no difference when one actually exists. * **C & D. Gamma and Delta Errors:** These are not standard terms used to classify primary sampling errors in classical hypothesis testing. While "Gamma" is used in specific correlation coefficients and "Delta" often represents effect size, they do not describe Type I or II errors. ### **NEET-PG High-Yield Pearls** * **Confidence Level:** Calculated as **(1 – α)**. It represents the probability of correctly accepting a true null hypothesis. * **Power of a Study:** Calculated as **(1 – β)**. It is the ability of a study to detect a difference if one truly exists. To increase power, one should increase the sample size. * **P-value:** The probability of committing a Type I error. If p < 0.05, the result is statistically significant. * **Memory Aid:** * **Type I (α):** **I**nnocent person goes to jail (False Positive). * **Type II (β):** **B**ad person goes free (False Negative).

Q: A village is divided into five relevant subgroups for the purpose of a survey. Individuals from each subgroup are then selected randomly. What is this type of sampling called?

Stratified sampling. ### Explanation **1. Why Stratified Sampling is Correct:** In **Stratified Random Sampling**, the heterogeneous population is first divided into non-overlapping, homogeneous subgroups called **"strata"** based on specific characteristics (e.g., age, gender, socio-economic status, or "relevant subgroups" as mentioned in the question). A **simple random sample** is then drawn from *each* of these strata. This ensures that every subgroup is adequately represented, reducing sampling error compared to simple random sampling. **2. Why Other Options are Incorrect:** * **Simple Random Sampling:** Every individual in the entire population has an equal chance of being selected. There is no prior division into subgroups. * **Cluster Sampling:** The population is divided into groups (clusters), usually based on geographical areas (e.g., villages, wards). Unlike stratified sampling, you randomly select a few *entire clusters* and survey everyone within them, rather than selecting individuals from every group. * **Systematic Sampling:** This involves selecting every $k^{th}$ individual (sampling interval) from a list, starting from a random point (e.g., every 5th person entering an OPD). **3. High-Yield Clinical Pearls for NEET-PG:** * **Stratified vs. Cluster:** In Stratified sampling, the groups are **homogeneous within** (similar people) but **heterogeneous between** (strata differ from each other). In Cluster sampling, groups are **heterogeneous within** but **homogeneous between** (each cluster is a mini-reflection of the population). * **Multistage Sampling:** This is the most common method used in large-scale national health surveys (like NFHS), involving a combination of sampling techniques. * **Precision:** Stratified sampling is generally more precise than simple random sampling because it accounts for variability between subgroups.

Q: What is true about the standard normal distribution?

The total area under the curve is equal to 1.. ### Explanation **1. Why Option A is Correct:** The Standard Normal Distribution (Z-distribution) is a specific type of probability density function. In statistics, the **total area under any probability curve must equal 1 (or 100%)**, representing the sum of all possible outcomes. This property is fundamental for calculating Z-scores and p-values, as the area under specific segments of the curve represents the probability of an observation falling within that range. **2. Why the Other Options are Incorrect:** * **Option B:** In a *Standard* Normal Distribution, the **Mean is always 0** and the Standard Deviation is 1. If the mean were 1, it would simply be a "Normal Distribution," not the "Standard" version. * **Option C:** The Normal Distribution is perfectly symmetrical. Therefore, the **Mean = Median = Mode**. The relationship "Mean > Median > Mode" describes a **Positively Skewed** distribution. * **Option D:** A distribution with a tail towards the right is **Positively Skewed**. The Standard Normal Distribution is bell-shaped and symmetrical with no skew; both tails extend infinitely but are identical in shape. **3. High-Yield Clinical Pearls for NEET-PG:** * **Z-score formula:** $Z = (x - \mu) / \sigma$. It tells you how many standard deviations a value is from the mean. * **Empirical Rule (68-95-99.7 Rule):** * Mean ± 1 SD covers **68.2%** of the area. * Mean ± 2 SD covers **95.4%** of the area. * Mean ± 3 SD covers **99.7%** of the area. * **Point of Inflection:** In a normal curve, this occurs at Mean ± 1 SD (where the curve changes from convex to concave). * **Standard Error:** As sample size increases, the standard error decreases, making the distribution narrower.

Q: Interpret the statistical graph shown below:

Positive correlation. ***Positive correlation*** - In a **positive correlation**, as one variable increases, the other variable also increases, creating an **upward trending pattern** on the scatter plot. - The data points form a **linear pattern** sloping from bottom-left to top-right, indicating a **direct relationship** between the variables. *Negative correlation* - Shows a **downward trending pattern** where one variable increases while the other decreases. - Data points slope from **top-left to bottom-right**, indicating an **inverse relationship** between variables. *Absent correlation* - Data points are **randomly scattered** with no discernible pattern or trend. - The **correlation coefficient (r)** approaches **zero**, indicating no linear relationship between variables. *Spurious correlation* - Represents a **false association** between two variables that appear correlated but lack a true causal relationship. - Often occurs due to **confounding variables** or **coincidental patterns** in the data.

Question 1

Which scale is used for classifying data that lacks a particular structure and is presented without inherent order?

Accepted Answer

Nominal

Answer

Ordinal

Answer

Interval

Answer

Ratio

Question 2

The median weight of 100 children was 16 kgs. The standard deviation was 8. Calculate the percentage coefficient of variance.

Accepted Answer

50%

Answer

35%

Answer

45%

Answer

55%

Question 3

In a cohort study of 7000 smokers over ten years, 70 developed lung cancer. In a concurrent evaluation of 7000 non-smokers in the same catchment area, 7 developed lung cancer. What is the Relative Risk (RR) for developing lung cancer?

Accepted Answer

10

Answer

1

Answer

100

Answer

0.1

Question 4

The "Crude death rate" is defined as the number of deaths (from all causes) per 1000 estimated what?

Accepted Answer

Mid-year population

Answer

Total population

Answer

Total births

Answer

Live births

Question 5

Type 1 Sampling error is classified as?

Accepted Answer

Alpha error

Answer

Beta error

Answer

Gamma error

Answer

Delta error

Question 6

A village is divided into five relevant subgroups for the purpose of a survey. Individuals from each subgroup are then selected randomly. What is this type of sampling called?

Accepted Answer

Stratified sampling

Answer

Simple Random sampling

Answer

Cluster Sampling

Answer

Systematic Sampling

Question 7

A chi-square test would be most appropriate for testing which one of the following hypotheses?

Accepted Answer

That a smaller proportion of people who were immunized against chickenpox subsequently develop zoster than those who were not immunized.

Answer

That the mean score of students is greater than that of other students.

Answer

That the mean blood pressure of black and white male-hypertensive patients taking ACE inhibitors is the same as that of black and white female-hypertensive patients taking ACE inhibitors and that of black and white males and females taking diuretics and placebos.

Answer

That the mean cost of treating a patient with coronary artery disease with angioplasty is greater than the mean cost of providing medical treatment.

Question 8

What is true about the standard normal distribution?

Accepted Answer

The total area under the curve is equal to 1.

Answer

The mean is equal to 1.

Answer

The mean is greater than the median, which is greater than the mode.

Answer

The distribution has a tail towards the right.

Question 9

What is the definition of neonatal mortality?

Accepted Answer

Number of deaths of children less than 28 days old per 1000 live births.

Answer

Number of deaths of children less than 28 days old per 1000 stillbirths.

Answer

Number of deaths of children less than 28 days old per 1000 total births.

Answer

None of the above.

Question 10

Interpret the statistical graph shown below:

Accepted Answer

Positive correlation

Answer

Negative correlation

Answer

Absent correlation

Answer

Spurious correlation

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?