Biostatistics Practice Questions

Q: Prevalence of cataract at one point in time can be determined by which type of study?

Cross-sectional study. **Explanation** The correct answer is **Cross-sectional study**. **1. Why Cross-sectional study is correct:** A cross-sectional study (also known as a **Prevalence Study**) examines a population at a single point in time (a "snapshot"). It measures the presence of a condition (cataract) and the exposure simultaneously. Since the question asks for the prevalence at "one point in time," this design is the gold standard. It calculates prevalence using the formula: *Total number of cases at a given time / Total population at risk at that time.* **2. Why other options are incorrect:** * **Longitudinal study:** This involves repeated observations of the same variables over a long period. While it can track changes, it is not the primary tool for a point-prevalence snapshot. * **Cohort study:** This is an observational, analytical study that starts with disease-free individuals and follows them forward in time to determine **Incidence** (new cases). It is used to establish causation and relative risk, not existing prevalence. * **Surveillance:** This is the continuous, systematic collection and analysis of health data for public health action (e.g., monitoring a malaria outbreak). It is a process of ongoing monitoring rather than a specific epidemiological study design for point prevalence. **High-Yield Clinical Pearls for NEET-PG:** * **Cross-sectional Study:** Provides a "Snapshot" of a population; measures **Prevalence**. * **Cohort Study:** Prospective; measures **Incidence** and **Relative Risk (RR)**. * **Case-Control Study:** Retrospective; measures **Odds Ratio (OR)**. * **Incidence vs. Prevalence:** Prevalence = Incidence × Mean Duration of disease ($P = I \times D$). * Cataract is the leading cause of blindness in India; prevalence studies are vital for planning National Programmes for Control of Blindness (NPCB).

Q: An experimental diagnostic test is developed to noninvasively detect the presence of trisomy 21, Down's syndrome. The test is administered to a group of 500 women considered to be at risk for a Down's fetus based on blood tests. The results are presented in the table below. What is the sensitivity of this new test? Trisomy 21: Positive Test = 100, Negative Test = 100 Normal Karyotype: Positive Test = 50, Negative Test = 250

50%. ### Explanation To calculate the sensitivity of a diagnostic test, we must first organize the data into a standard **2x2 Contingency Table**: | | Trisomy 21 (Disease +) | Normal (Disease -) | Total | | :--- | :---: | :---: | :---: | | **Test Positive** | 100 (TP) | 50 (FP) | 150 | | **Test Negative** | 100 (FN) | 250 (TN) | 350 | | **Total** | **200** | **300** | **500** | **1. Why the Correct Answer (B) is 50%:** **Sensitivity** is the ability of a test to correctly identify those with the disease (True Positive Rate). * **Formula:** [TP / (TP + FN)] × 100 * **Calculation:** [100 / (100 + 100)] × 100 = [100 / 200] × 100 = **50%**. This means the test only identifies half of the actual Down's syndrome cases. **2. Analysis of Incorrect Options:** * **Option A (40%):** This is an incorrect calculation, likely derived from dividing TP by TN (100/250). * **Option C (67%):** This represents the **Positive Predictive Value (PPV)**. Formula: [TP / (TP + FP)] = 100/150 = 66.6%. * **Option D (71%):** This represents the **Negative Predictive Value (NPV)**. Formula: [TN / (TN + FN)] = 250/350 = 71.4%. **3. Clinical Pearls & High-Yield Facts for NEET-PG:** * **Sensitivity (SNNP):** A highly **S**ensitive test, when **N**egative, helps rule **O**ut the disease. It is ideal for **screening** tests. * **Specificity (SPIN):** A highly **Sp**ecific test, when **P**ositive, helps rule **I**n the disease. It is ideal for **confirmatory** tests. * **Specificity in this case:** [TN / (TN + FP)] = 250/300 = 83.3%. * **Prevalence:** In this study group, prevalence is (Total Disease+ / Total Population) = 200/500 = 40%. Note that PPV and NPV are dependent on disease prevalence, whereas Sensitivity and Specificity are inherent properties of the test.

Q: Which of the following study designs is used to investigate more than one possible outcome?

Cohort study. ### Explanation **Correct Answer: A. Cohort study** In a **Cohort study**, the investigator starts with a group of individuals who are currently free of the disease but are classified based on their exposure to a specific risk factor. These individuals are followed forward in time (prospectively) to see who develops the disease. Because the study follows an exposed group over time, researchers can observe and record **multiple different outcomes** resulting from a single exposure. For example, a cohort study on smoking can track the development of lung cancer, COPD, and coronary heart disease simultaneously. **Why the other options are incorrect:** * **B. Case-control study:** This design starts with the outcome (the disease) and looks backward to identify exposures. It is ideal for studying **multiple exposures** for a single outcome, but it cannot efficiently study multiple outcomes. * **C. Cross-sectional study:** This provides a "snapshot" of a population at a single point in time. It measures prevalence rather than incidence and is not designed to track the progression of multiple outcomes over time. * **D. Case reports:** These are descriptive studies focusing on a single patient or a small group. They are used for generating hypotheses about new diseases or rare drug side effects, not for investigating multiple outcomes systematically. **High-Yield Clinical Pearls for NEET-PG:** * **Cohort Study:** Best for rare exposures; can calculate **Incidence** and **Relative Risk (RR)**. * **Case-Control Study:** Best for rare diseases; can calculate **Odds Ratio (OR)**. * **Mnemonic:** * **C**ohort = Multiple **C**onsequences (Outcomes). * **C**ase-Control = Multiple **C**auses (Exposures). * The hallmark of a cohort study is that it proceeds from **Cause to Effect**.

Q: In a statistical study to calculate the effect of a drug on a patient's sugar level, a test showed a significant difference when in reality there was no difference. What is this type of error called?

Alpha error. ### Explanation This question tests the fundamental understanding of **Hypothesis Testing** in Biostatistics, a high-yield area for NEET-PG. **1. Why Alpha Error is Correct:** An **Alpha ($\alpha$) error**, also known as a **Type I error**, occurs when a researcher rejects the Null Hypothesis ($H_0$) even though it is actually true. In clinical terms, this is a **"False Positive"** result. In this scenario, the test showed a "significant difference" (rejected the null) when in reality there was "no difference" (null was true). It is essentially "finding a difference where none exists." **2. Analysis of Incorrect Options:** * **Beta ($\beta$) error (Type II error):** This occurs when the researcher fails to reject a false Null Hypothesis. It is a **"False Negative"**—concluding there is no difference when one actually exists ("missing a real difference"). * **Gamma error:** This is not a standard term used in basic hypothesis testing for medical statistics. * **Power of a test ($1-\beta$):** This is the probability that a test will correctly identify a significant difference if one truly exists. It is the ability of a study to avoid a Type II error. **3. Clinical Pearls & High-Yield Facts:** * **P-value:** This is the probability of committing a Type I error. Usually, a p-value < 0.05 is considered statistically significant. * **Confidence Interval (CI):** $1 - \alpha$. If $\alpha$ is 0.05 (5%), the Confidence Level is 95%. * **Memory Aid:** * **Type I (Alpha):** **I**nnocent person convicted (False Positive). * **Type II (Beta):** **B**ad person set free (False Negative). * **Relationship:** Decreasing the risk of a Type I error usually increases the risk of a Type II error unless the sample size is increased.

Q: Which of the following best describes a normal curve?

The distribution of data is symmetrical.. In Biostatistics, the **Normal Distribution** (also known as the Gaussian distribution) is a fundamental concept representing how continuous variables are distributed in a population. ### 1. Why Option A is Correct A normal curve is characterized by its **perfect symmetry** around the center. In a perfectly normal distribution: * The curve is bell-shaped. * The **Mean, Median, and Mode are all equal** and coincide at the peak of the curve. * The total area under the curve is 1 (or 100%), with exactly 50% of observations lying on either side of the center. ### 2. Why Other Options are Incorrect Options B, C, and D describe **Skewed Distributions**, where the symmetry is lost: * **Option B (Mean > Mode):** This describes a **Positively Skewed** (Right-skewed) distribution. The tail extends towards the right (higher values), pulling the mean away from the peak. * **Options C & D (Mode/Median > Mean):** These describe a **Negatively Skewed** (Left-skewed) distribution. The tail extends towards the left (lower values), pulling the mean down below the median and mode. ### 3. NEET-PG High-Yield Pearls * **Standard Normal Curve:** A specific normal curve where the **Mean is 0** and the **Standard Deviation (SD) is 1**. * **68-95-99.7 Rule (Empirical Rule):** * Mean ± 1 SD covers **68.3%** of values. * Mean ± 2 SD covers **95.4%** of values. * Mean ± 3 SD covers **99.7%** of values. * **Clinical Application:** Most biological parameters (e.g., height, blood pressure, IQ) follow a normal distribution in a healthy population. If a distribution is highly skewed, the **Median** is considered a better measure of central tendency than the Mean.

Q: Calculate the range from the following frequency distribution: | Class Interval | Frequency | |----------------|-----------| | 10 – 15 | 3 | | 15 – 20 | 7 | | 20 – 25 | 5 | Total n = 15

15. ***15*** - **Range** is calculated as the difference between the **maximum value** and the **minimum value** in the dataset. - From the frequency distribution, the **lowest class boundary is 10** and the **highest class boundary is 25**, so Range = 25 - 10 = **15**. *10* - This represents the **lower boundary** of the first class interval, not the range of the distribution. - Range requires calculating the **difference between extremes**, not just identifying the minimum value. *20* - This is the **upper boundary** of the second class interval, which is neither the maximum nor the range. - It does not represent the **spread** or **variability** of the entire dataset. *25* - This is the **upper boundary** of the highest class interval, representing the maximum value but not the range. - Range is the **difference between maximum and minimum**, not just the maximum value alone.

Q: An investigator finds that 5 independent factors influence the occurrence of a disease. Comparison of multiple factors that are responsible for the disease can be assessed by:

Multiple logistic regression. ### Explanation The core of this question lies in identifying the relationship between multiple independent variables (risk factors) and a single dependent variable (disease occurrence). **1. Why Multiple Logistic Regression is Correct:** In medical research, the "occurrence of a disease" is typically a **dichotomous (binary) outcome**—meaning the patient either has the disease or does not (Yes/No). When you need to assess the influence of multiple independent factors (which can be continuous or categorical) on a single binary outcome, **Multiple Logistic Regression** is the statistical tool of choice. It calculates the **Odds Ratio (OR)** for each factor while controlling for confounders. **2. Why the Other Options are Incorrect:** * **ANOVA (Analysis of Variance):** Used to compare the **means** of a continuous variable across three or more categorical groups (e.g., comparing mean blood pressure across three different diet groups). * **Multiple Linear Regression:** Used when the dependent variable is **continuous** (e.g., predicting exact blood sugar levels based on age, weight, and exercise). It is not used for binary "yes/no" outcomes. * **Chi-square Test:** Used to find an association between two **categorical** variables (e.g., smoking and lung cancer). It cannot handle multiple independent factors simultaneously in its basic form. **3. High-Yield Clinical Pearls for NEET-PG:** * **Logistic Regression = Dichotomous Outcome** (Disease vs. No Disease). It yields **Odds Ratio**. * **Linear Regression = Continuous Outcome** (Height, Weight, BP). It yields a **Correlation Coefficient (r)**. * **ANOVA** = Comparison of **Means** (3+ groups). * **Paired t-test** = Comparison of means in the **same group** (Before vs. After treatment). * **Unpaired t-test** = Comparison of means between **two different groups**.

Q: A 95% confidence interval for the prevalence of cancer in smokers aged over 65 years is 56% to 76%. What is the chance that the true prevalence is less than 56%?

2.50%. ### Explanation **1. Understanding the Correct Answer (C: 2.50%)** A **95% Confidence Interval (CI)** represents the range within which we are 95% certain the true population parameter (prevalence) lies. This means there is a **5% total probability** that the true value falls *outside* this range. In a normal distribution (bell curve), this 5% error is distributed equally into two "tails": * **Lower Tail:** 2.5% chance the true value is *less than* the lower limit (56%). * **Upper Tail:** 2.5% chance the true value is *greater than* the upper limit (76%). Therefore, the probability that the true prevalence is less than 56% is exactly **2.5%**. **2. Why Other Options are Incorrect** * **A (Nil):** Incorrect. A confidence interval does not provide absolute certainty; there is always a calculated risk of error (alpha). * **B (44%):** Incorrect. This is simply the complement of the lower limit (100% - 56%), which has no statistical relevance to the probability of the true mean. * **D (5%):** Incorrect. This represents the *total* probability of the true value being outside the interval (both tails combined). The question specifically asks for the probability of being *less than* the lower limit (one tail). **3. High-Yield Clinical Pearls for NEET-PG** * **Confidence Interval (CI) Formula:** $Mean \pm (1.96 \times SE)$ for 95% CI; $Mean \pm (2.58 \times SE)$ for 99% CI. * **Precision vs. Sample Size:** A larger sample size results in a narrower (more precise) confidence interval. * **P-value vs. CI:** If a 95% CI for a difference between two groups includes **zero**, the results are not statistically significant ($p > 0.05$). If a 95% CI for an Odds Ratio or Relative Risk includes **one**, it is not significant. * **Interpretation:** A 95% CI means if the study were repeated 100 times, the true value would fall within the calculated interval in 95 of those instances.

Q: Due to an effective prevention program, the prevalence of an infectious disease in a community has been reduced by 90%. A physician continues to use the same diagnostic test for the disease that she has always used. How have the test’s characteristics changed?

Its negative predictive value has increased. ### Explanation The core concept tested here is the relationship between **disease prevalence** and **predictive values**. **1. Why Option D is Correct:** Negative Predictive Value (NPV) is the probability that a person who tests negative truly does not have the disease. NPV is **inversely proportional** to prevalence. When the prevalence of a disease decreases (in this case, by 90%), the number of "True Negatives" in the population increases significantly relative to "False Negatives." Therefore, a negative test result becomes even more reliable, leading to an **increase in NPV**. **2. Why the Other Options are Incorrect:** * **Option A & C:** Sensitivity and Specificity are **intrinsic properties** of a diagnostic test. They depend on the test’s design (e.g., the cutoff point) and are independent of the prevalence of the disease in the population. Thus, they remain unchanged. * **Option B:** Positive Predictive Value (PPV) is **directly proportional** to prevalence. As prevalence falls, the likelihood that a positive result is a "False Positive" increases. Therefore, the PPV would **decrease**, not increase. **3. High-Yield Clinical Pearls for NEET-PG:** * **Prevalence $\uparrow$:** PPV increases, NPV decreases. * **Prevalence $\downarrow$:** PPV decreases, NPV increases. * **Sensitivity/Specificity:** These are fixed characteristics of the test and do not change with prevalence. * **Screening in Low-Prevalence Areas:** When screening for rare diseases, the PPV is usually low, meaning many positive results will be false positives. This is why confirmatory tests are essential. * **Formula Recall:** * $PPV = \text{True Positives} / (\text{True Positives} + \text{False Positives})$ * $NPV = \text{True Negatives} / (\text{True Negatives} + \text{False Negatives})$

Q: In a community of 6000 people, there are 150 cases of TB and 30 deaths due to TB. What is the TB-specific death rate per 1000 population?

0-5. ### Explanation **1. Understanding the Correct Answer (D)** The **Specific Death Rate** measures the number of deaths due to a specific cause per 1,000 population in a given year. The formula is: $$\text{Specific Death Rate} = \frac{\text{Number of deaths from a specific disease}}{\text{Total mid-year population}} \times 1000$$ **Calculation:** * Total Population = 6,000 * Deaths due to TB = 30 * Calculation: $(30 / 6,000) \times 1,000 = 5$ per 1,000 population. Since the result is exactly 5, it falls within the range of **Option D (0-5)**. **2. Why Other Options are Incorrect** * **Option A (20):** This value is obtained if you calculate the **Case Fatality Rate (CFR)**. CFR is the percentage of people diagnosed with a disease who die from it: $(30 / 150) \times 100 = 20\%$. While 20 is a relevant number, it represents lethality, not the population death rate. * **Option B (10):** This is a distractor resulting from calculation errors (e.g., using 3,000 as the denominator). * **Option C (5):** While the numerical value is 5, in many competitive exams, if a range is provided that includes the exact value (0-5), it is selected as the most appropriate category. **3. NEET-PG High-Yield Pearls** * **Case Fatality Rate (CFR):** Reflects the **virulence** or killing power of a disease. It is a ratio, not a true rate (expressed as a percentage). * **Cause-Specific Death Rate:** Reflects the **burden** of a disease on the total community. * **Proportional Mortality Rate:** (Deaths from TB / Total deaths from all causes) × 100. It indicates the relative importance of a specific cause of death. * **Prevalence of TB in this scenario:** $(150 / 6,000) \times 100 = 2.5\%$.

Question 1

Prevalence of cataract at one point in time can be determined by which type of study?

Accepted Answer

Cross-sectional study

Answer

Longitudinal study

Answer

Surveillance

Answer

Cohort study

Question 2

An experimental diagnostic test is developed to noninvasively detect the presence of trisomy 21, Down's syndrome. The test is administered to a group of 500 women considered to be at risk for a Down's fetus based on blood tests. The results are presented in the table below. What is the sensitivity of this new test?

Trisomy 21: Positive Test = 100, Negative Test = 100
Normal Karyotype: Positive Test = 50, Negative Test = 250

Accepted Answer

50%

Answer

40%

Answer

67%

Answer

71%

Question 3

Which of the following study designs is used to investigate more than one possible outcome?

Accepted Answer

Cohort study

Answer

Case control study

Answer

Cross sectional study

Answer

Case reports

Question 4

In a statistical study to calculate the effect of a drug on a patient's sugar level, a test showed a significant difference when in reality there was no difference. What is this type of error called?

Accepted Answer

Alpha error

Answer

Beta error

Answer

Gamma error

Answer

Power of a test

Question 5

Which of the following best describes a normal curve?

Accepted Answer

The distribution of data is symmetrical.

Answer

The mean is greater than the mode.

Answer

The mode is greater than the mean.

Answer

The median is greater than the mean.

Question 6

Calculate the range from the following frequency distribution:

| Class Interval | Frequency |
|----------------|-----------|
| 10 – 15        | 3         |
| 15 – 20        | 7         |
| 20 – 25        | 5         |

Total n = 15

Accepted Answer

15

Answer

10

Answer

20

Answer

25

Question 7

An investigator finds that 5 independent factors influence the occurrence of a disease. Comparison of multiple factors that are responsible for the disease can be assessed by:

Accepted Answer

Multiple logistic regression

Answer

ANOVA

Answer

Multiple linear regression

Answer

Chi-square test

Question 8

A 95% confidence interval for the prevalence of cancer in smokers aged over 65 years is 56% to 76%. What is the chance that the true prevalence is less than 56%?

Accepted Answer

2.50%

Answer

Nil

Answer

44%

Answer

5%

Question 9

Due to an effective prevention program, the prevalence of an infectious disease in a community has been reduced by 90%. A physician continues to use the same diagnostic test for the disease that she has always used. How have the test’s characteristics changed?

Accepted Answer

Its negative predictive value has increased

Answer

Its sensitivity has increased

Answer

Its positive predictive value has increased

Answer

The test’s characteristics have not changed

Question 10

In a community of 6000 people, there are 150 cases of TB and 30 deaths due to TB. What is the TB-specific death rate per 1000 population?

Accepted Answer

0-5

Answer

20

Answer

10

Answer

5

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?