Biostatistics Practice Questions

Q: Prevalence is a:

Proportion. **Explanation:** **Why Proportion is the Correct Answer:** In epidemiology, **Prevalence** measures the total number of existing cases (old and new) in a specific population at a given point or period in time. It is mathematically expressed as: $$\text{Prevalence} = \frac{\text{Number of existing cases of a disease}}{\text{Total population at risk at that time}}$$ Because the **numerator (cases) is a part of the denominator (total population)**, it is by definition a **Proportion**. It is usually expressed as a decimal or a percentage (e.g., 0.05 or 5%), but its fundamental mathematical nature is a proportion. **Why Other Options are Incorrect:** * **Rate:** A rate measures the speed of occurrence of an event over time (e.g., Incidence Rate). It must have a **time dimension** in the denominator (e.g., per 1,000 person-years). Prevalence is a "snapshot" and lacks this time element. * **Ratio:** A ratio expresses the relation between two independent quantities where the numerator is *not* part of the denominator (e.g., Sex Ratio, Waist-Hip Ratio). * **Percentage:** While prevalence is often *reported* as a percentage, "Proportion" is the more accurate mathematical classification. A percentage is simply a proportion multiplied by 100. **High-Yield Clinical Pearls for NEET-PG:** * **Incidence is a Rate:** It measures only *new* cases. * **Prevalence = Incidence × Mean Duration of disease ($P = I \times D$).** This formula is valid only when the disease is stable (stationary population). * **Factors increasing Prevalence:** Longer duration of illness, prolongation of life without a cure, increase in new cases (incidence), and in-migration of cases. * **Factors decreasing Prevalence:** Shorter duration of disease, high case fatality rate, and improved cure rates.

Q: What is the population attributable risk given the incidence rate of lung cancer among smokers is 8 per 1000 and among non-smokers is 1 per 1000?

89%. ### Explanation **1. Understanding the Concept** The question asks for the **Attributable Risk (AR)**, also known as Risk Difference. This measures the amount of disease incidence that can be attributed directly to a specific exposure (smoking). It represents the potential reduction in disease if the exposure were eliminated. **Formula:** $$AR = \frac{\text{Incidence in Exposed} (I_e) - \text{Incidence in Non-exposed} (I_o)}{\text{Incidence in Exposed} (I_e)} \times 100$$ **Calculation:** * $I_e$ (Smokers) = 8 per 1000 * $I_o$ (Non-smokers) = 1 per 1000 * $AR = \frac{8 - 1}{8} \times 100 = \frac{7}{8} \times 100 = 87.5\%$ Rounding to the nearest option, **89% (Option A)** is the correct choice. This means 87.5% of lung cancer cases among smokers are specifically due to smoking. **2. Analysis of Incorrect Options** * **Option B (95%):** This would require a much higher ratio between exposed and non-exposed (e.g., 20 per 1000 vs 1 per 1000). * **Option C (10%):** This would occur if the incidence rates were very close (e.g., 1.1 vs 1.0), suggesting the exposure has a weak association with the disease. * **Option D (100%):** This is only possible if the incidence in non-exposed is zero, which is clinically impossible for lung cancer. **3. High-Yield Clinical Pearls for NEET-PG** * **Relative Risk (RR):** Measures the *strength* of association (Formula: $I_e / I_o$). Here, $RR = 8/1 = 8$. It is best for identifying etiological roles. * **Attributable Risk (AR):** Measures the *public health impact*. It tells us how much of the disease can be prevented by removing the risk factor. * **Population Attributable Risk (PAR):** Differs from AR as it considers the prevalence of the exposure in the total population, not just the exposed group. * **Key Distinction:** RR is used in **Cohort studies**, while Odds Ratio (OR) is used in **Case-control studies**.

Q: In a normal distribution curve, what percentage of the data falls within the limits of mean +/- 2 standard deviations?

95%. ### Explanation **Underlying Concept: The Empirical Rule (68-95-99.7 Rule)** In Biostatistics, a **Normal (Gaussian) Distribution** is a symmetrical, bell-shaped curve where the mean, median, and mode coincide. The spread of data in this distribution is mathematically defined by the **Standard Deviation (SD)**. According to the Empirical Rule, specific percentages of data points consistently fall within fixed SD intervals from the mean: * Mean ± 1 SD covers **68.2%** of the data. * Mean ± 2 SD covers **95.4%** (commonly rounded to **95%**) of the data. * Mean ± 3 SD covers **99.7%** of the data. **Analysis of Options:** * **Option C (Correct):** 95% is the standard statistical threshold for the "Normal Range" in clinical medicine. If a value falls outside Mean ± 2 SD, it is considered statistically significant (p < 0.05). * **Option A (66%):** This is an incorrect approximation. The actual value for 1 SD is 68%. * **Option B (78%):** This value has no specific significance in the standard normal distribution curve. * **Option D (99%):** This represents nearly the entire dataset but specifically corresponds to **3 SD** (99.7%), not 2 SD. **High-Yield Clinical Pearls for NEET-PG:** 1. **Confidence Intervals:** The 95% Confidence Interval (CI) is the most frequently used in medical research to denote precision. 2. **Z-Score:** A Z-score indicates how many SDs a value is from the mean. For this question, the Z-score is 2. 3. **Skewness:** If the mean is greater than the median, it is a **Positively Skewed** (right-tailed) distribution; if the mean is less than the median, it is **Negatively Skewed** (left-tailed). 4. **Standard Normal Curve:** A specific normal distribution where the **Mean is 0** and the **SD is 1**.

Q: Which of the following is used to compare two data sets taken on two different scales of measurement?

Coefficient of variation. ### Explanation The correct answer is **Coefficient of Variation (CV)**. **1. Why Coefficient of Variation is correct:** In biostatistics, when we need to compare the variability of two datasets that have different units (e.g., comparing height in cm vs. weight in kg) or significantly different means (e.g., comparing the weight of newborns vs. adults), absolute measures like standard deviation cannot be used. The **Coefficient of Variation** is a relative measure of dispersion. It is calculated as: $$CV = \frac{\text{Standard Deviation (SD)}}{\text{Mean}} \times 100$$ Because it is expressed as a percentage, the units cancel out, making it a "unitless" measure. This allows for a fair comparison of consistency or precision across different scales. **2. Why other options are incorrect:** * **Variance (A):** This is the square of the standard deviation. It is expressed in squared units of the original data, making it unsuitable for comparing different scales. * **Standard Error of Mean (C):** This measures the dispersion of sample means around the true population mean. It is used for statistical inference (calculating confidence intervals), not for comparing variability between different scales. * **Standard Deviation (D):** This measures the average distance of data points from the mean in the same units as the data. If one set is in grams and another in kilograms, the SDs cannot be directly compared. **3. NEET-PG High-Yield Pearls:** * **Unitless Measure:** CV is the only measure of dispersion in this list that has no units. * **Consistency:** A lower CV indicates higher consistency/reliability of the data. * **Precision:** In laboratory medicine, CV is frequently used to check the precision of diagnostic equipment. * **Standard Deviation vs. Standard Error:** Remember that $SEM = \frac{SD}{\sqrt{n}}$. SEM is always smaller than SD.

Q: Which type of variable is "Social Class" if it has four categories (I to V), with Class I representing the highest social class and Class V representing the lowest?

Ordinal. **Explanation:** The correct answer is **Ordinal**. In biostatistics, variables are classified based on the nature of the data they represent. **1. Why Ordinal is Correct:** An **Ordinal variable** is a type of qualitative (categorical) data where the categories have a **natural, inherent order or rank**, but the mathematical distance between the categories is not defined. In this question, Social Class (I to V) follows a clear hierarchy from highest to lowest status. Other common medical examples include stages of cancer (I-IV), pain scales (mild, moderate, severe), and Glasgow Coma Scale scores. **2. Why Incorrect Options are Wrong:** * **Dichotomous:** These are variables with only two mutually exclusive categories (e.g., Dead/Alive, Male/Female). Social class here has five categories. * **Nominal:** These are categorical variables with no intrinsic ranking or order (e.g., Blood groups A, B, AB, O; or Religion). While social class is categorical, the presence of a "rank" makes it ordinal rather than nominal. * **Interval:** This is a quantitative variable where the distance between values is equal and meaningful, but there is no true zero point (e.g., Temperature in Celsius). Social class is a qualitative rank, not a precise numerical measurement. **Clinical Pearls for NEET-PG:** * **Qualitative Data:** Includes Nominal (lowest level) and Ordinal. * **Quantitative Data:** Includes Discrete (whole numbers, e.g., number of beds) and Continuous (decimals possible, e.g., Height, Weight). * **High-Yield Tip:** For Ordinal data, the best measure of central tendency is the **Median**, whereas for Nominal data, it is the **Mode**.

Q: All of the following are continuous variables except?

Human blood group. **Explanation:** In biostatistics, variables are broadly classified into **Quantitative (Numerical)** and **Qualitative (Categorical)**. **Why Human Blood Group is the correct answer:** Human blood group (A, B, AB, O) is a **Qualitative/Categorical variable**. Specifically, it is a **Nominal variable** because the categories have no inherent numerical value or logical ranking. You cannot have a blood group of "A.5." Since it consists of distinct categories rather than a range of values on a continuum, it is not a continuous variable. **Why the other options are incorrect:** * **Weight (kg), Height (cm), and Hb levels (mg/dl):** These are all **Quantitative Continuous variables**. * A continuous variable can take any value within a given range, including decimals and fractions (e.g., a weight of 70.45 kg or Hb of 12.8 mg/dl). * The precision of these variables is limited only by the measuring instrument used. **High-Yield Clinical Pearls for NEET-PG:** 1. **Discrete vs. Continuous:** Discrete variables are counted in whole numbers (e.g., number of children in a family, number of beds in a hospital), whereas continuous variables are measured (e.g., BP, Serum Cholesterol). 2. **Scales of Measurement (NOIR):** * **Nominal:** Categories with no order (e.g., Gender, Religion, Blood Group). * **Ordinal:** Categories with a natural rank (e.g., Stages of Cancer, Socio-economic status). * **Interval:** Numerical scale with no absolute zero (e.g., Temperature in Celsius). * **Ratio:** Numerical scale with an absolute zero (e.g., Pulse rate, Height). 3. **Note:** While "Pulse rate" is often treated as continuous in calculations, it is technically a discrete variable (counts per minute). However, for most PG exams, physical measurements are categorized as continuous.

Q: A scatter plot is used to display which of the following?

Correlation. **Explanation:** A **scatter plot** (or scatter diagram) is a graphical representation used to display the relationship between two continuous (quantitative) variables. By plotting individual data points on an X and Y axis, we can visually assess the **correlation**—the degree and direction of the linear relationship between the variables. * **Why Correlation is Correct:** In a scatter plot, if the points cluster along a line rising from left to right, it indicates a **positive correlation** (e.g., height and weight). If they fall from left to right, it indicates a **negative correlation** (e.g., exercise and resting heart rate). If points are randomly scattered, there is **no correlation**. **Analysis of Incorrect Options:** * **Causality (A):** A scatter plot shows association, not causation. "Correlation does not imply causation." To prove causality, experimental designs like Randomized Controlled Trials (RCTs) are required. * **Statistical Power (C):** This is the probability (1-β) of correctly rejecting a null hypothesis when it is false. It is a numerical value, not something visualized via a scatter plot. * **Type II Error (D):** Also known as a "False Negative" (β), this occurs when we fail to reject a null hypothesis that is actually false. It is related to sample size and power, not graphical correlation. **High-Yield Clinical Pearls for NEET-PG:** * **Correlation Coefficient (r):** The scatter plot is the visual precursor to calculating 'r'. The value of 'r' ranges from **-1 to +1**. * **Line of Best Fit:** A regression line can be drawn through a scatter plot to predict the value of one variable based on another. * **Other Graphs:** * **Histogram:** For continuous data (frequency distribution). * **Bar Chart:** For discrete/nominal data. * **Box-and-Whisker Plot:** To show median and quartiles (dispersion).

Q: Birth and death registration must be done within what time period?

21 days each. ### Explanation **1. Why Option C is Correct:** The registration of vital events (births, deaths, and stillbirths) in India is governed by the **Registration of Births and Deaths (RBD) Act, 1969**. According to the uniform rules implemented across the country since 2000, the statutory time limit for reporting these events to the Registrar is **21 days**. This uniform window was established to streamline data collection for the Civil Registration System (CRS), ensuring timely demographic tracking and legal documentation. **2. Why Other Options are Incorrect:** * **Options A & B:** These reflect older guidelines or specific state-level rules that existed prior to the 2000 amendment. Previously, the limits were often 7 days for deaths and 14 days for births, but these are no longer applicable under the current national mandate. * **Option D:** There is no provision in the RBD Act that allows for a 28-day standard reporting window. While delayed registration is possible with a late fee (after 21 days but within 30 days) or a magistrate's order (after one year), the standard legal requirement remains 21 days. **3. High-Yield Facts for NEET-PG:** * **The Act:** Registration of Births and Deaths Act was passed in **1969**. * **Hierarchy:** The **Registrar General of India** operates at the central level, while the **Chief Registrar** operates at the state level. * **Delayed Registration:** * *21–30 days:* Registered on payment of a late fee. * *30 days to 1 year:* Requires written permission from the prescribed authority and an affidavit. * * >1 year:* Requires an order from a First Class Magistrate. * **International Comparison:** Note that the WHO recommends registration within 24 hours, but for Indian legal exams, **21 days** is the gold standard.

Q: What does a scatter diagram represent?

Correlation or association. ### Explanation **Why Option C is Correct:** A **Scatter Diagram** (or Scatter Plot) is a graphical tool used in biostatistics to represent the relationship between two quantitative (numerical) variables. Each point on the graph represents a pair of values $(x, y)$. By observing the pattern of these points, we can determine the **correlation or association** between the variables: * **Positive Correlation:** Points move from bottom-left to top-right (e.g., as BMI increases, Blood Pressure increases). * **Negative Correlation:** Points move from top-left to bottom-right (e.g., as exercise increases, resting heart rate decreases). * **No Correlation:** Points are scattered randomly with no discernible pattern. **Why Other Options are Incorrect:** * **Option A (Frequency of occurrence):** This is typically represented by a **Histogram**, Frequency Polygon, or Bar Chart. These tools show how often a particular value occurs in a dataset. * **Option B (Trend over time):** This is represented by a **Line Diagram** (or Line Graph). It is specifically used to show changes in a variable (like disease incidence or mortality rates) over a chronological period. **High-Yield Clinical Pearls for NEET-PG:** * **Correlation Coefficient ($r$):** The scatter diagram provides a visual qualitative assessment, while the Pearson correlation coefficient ($r$) provides the quantitative measure (ranging from $-1$ to $+1$). * **Regression:** While a scatter diagram shows association, a **Regression Line** (line of best fit) drawn through the points is used to predict the value of a dependent variable based on an independent variable. * **Qualitative Data:** Remember that scatter diagrams are only for **quantitative** data. For qualitative (categorical) data, use Bar Charts or Pie Charts.

Q: High specificity detects?

High true negatives. **Explanation:** **Specificity** is defined as the ability of a diagnostic test to correctly identify those **without the disease**. It is the proportion of truly healthy people who are identified as healthy by the test. 1. **Why Option A is correct:** The formula for Specificity is: **True Negatives (TN) / (True Negatives + False Positives)**. A test with high specificity has a very low rate of False Positives. Therefore, it is highly efficient at identifying **True Negatives**. If a test is 100% specific, it means all individuals without the disease will test negative. 2. **Why the other options are incorrect:** * **Low true negative (B):** This would indicate a test with low specificity, meaning it fails to identify healthy individuals correctly. * **High false positive (C):** Specificity and False Positives are inversely related. High specificity means **Low False Positives**. (Formula: Specificity = 1 – False Positive Rate). * **High true positive (D):** This refers to **Sensitivity**, which is the ability of a test to correctly identify those who *have* the disease. **NEET-PG High-Yield Pearls:** * **SNNIN:** A **S**pecific test, when **N**egative, rules **IN** the disease (because false positives are rare, a positive result is highly likely to be a true positive). * **SPIN:** **S**pecificity rules **IN**; **SNOUT:** **S**ensitivity rules **OUT**. * Specificity is used for **Confirmatory tests** (e.g., Western Blot for HIV) to avoid the psychological and economic trauma of a false diagnosis. * In a 2x2 contingency table, Specificity is calculated vertically in the second column: **d / (b + d)**.

Question 1

Prevalence is a:

Accepted Answer

Proportion

Answer

Rate

Answer

Ratio

Answer

Percentage

Question 2

What is the population attributable risk given the incidence rate of lung cancer among smokers is 8 per 1000 and among non-smokers is 1 per 1000?

Accepted Answer

89%

Answer

95%

Answer

10%

Answer

100%

Question 3

In a normal distribution curve, what percentage of the data falls within the limits of mean +/- 2 standard deviations?

Accepted Answer

95%

Answer

66%

Answer

78%

Answer

99%

Question 4

Which of the following is used to compare two data sets taken on two different scales of measurement?

Accepted Answer

Coefficient of variation

Answer

Variance

Answer

Standard error of mean

Answer

Standard deviation

Question 5

Which type of variable is "Social Class" if it has four categories (I to V), with Class I representing the highest social class and Class V representing the lowest?

Accepted Answer

Ordinal

Answer

Dichotomous

Answer

Nominal

Answer

Interval

Question 6

All of the following are continuous variables except?

Accepted Answer

Human blood group

Answer

Weight in kgs

Answer

Height in cm

Answer

Hb levels in mg/dl

Question 7

A scatter plot is used to display which of the following?

Accepted Answer

Correlation

Answer

Causality

Answer

Statistical power

Answer

Type II error

Question 8

Birth and death registration must be done within what time period?

Accepted Answer

21 days each

Answer

7 days and 14 days

Answer

14 days and 7 days

Answer

14 days and 28 days

Question 9

What does a scatter diagram represent?

Accepted Answer

Correlation or association

Answer

Frequency of occurrence

Answer

Trend over time

Answer

None of the above

Question 10

High specificity detects?

Accepted Answer

High true negatives

Answer

Low true negative

Answer

High false positive

Answer

High true positive

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?