Biostatistics Practice Questions

Q: Which graphical representation best represents the number of dengue cases in Delhi from 2000 to 2015?

Line chart. **Explanation:** The correct answer is **Line chart** because the question describes **Time Series Data** (the trend of a disease over a continuous period). **1. Why Line Chart is Correct:** A line chart (or line graph) is the most effective way to represent trends, fluctuations, or changes in a variable over time. In epidemiology, it is used to visualize the secular trend of a disease (e.g., dengue cases over 15 years), allowing clinicians to identify patterns like seasonality, outbreaks, or the effectiveness of public health interventions. **2. Why Other Options are Incorrect:** * **Histogram:** Used to represent the frequency distribution of **continuous quantitative data** (e.g., age groups or hemoglobin levels) within a single time frame. It does not show trends over years. * **Scatter Diagram:** Used to show the **correlation or relationship** between two different quantitative variables (e.g., the relationship between rainfall and the number of mosquito breeding sites). * **Bar Chart:** Primarily used for **discrete/qualitative data** (e.g., comparing the number of cases in Delhi vs. Mumbai). While it can show yearly data, it is less effective than a line chart for visualizing a continuous "flow" or trend over a long duration. **Clinical Pearls for NEET-PG:** * **Trend Visualization:** For "Time Series" data, always choose a Line Chart. * **Epidemic Curve:** This is a special type of histogram used to show the distribution of cases over time during an outbreak. * **Frequency Polygon:** Created by joining the midpoints of histogram bars; it is useful for comparing two or more frequency distributions on the same graph. * **Pie Chart:** Best for showing the relative proportion of different categories (e.g., percentage of different dengue serotypes).

Q: Which of the following statements is true about bar charts?

Rectangular bars are used to represent the data.. **Explanation:** In biostatistics, a **Bar Chart** is a fundamental tool used to represent **qualitative (categorical) data**. It consists of a series of discrete rectangular bars where the length or height of each bar is proportional to the frequency or value of the category it represents. **Why Option D is Correct:** The defining characteristic of a bar chart is the use of **rectangular bars** to represent data. These bars are separated by **equal spaces** to indicate that the data is discrete (nominal or ordinal) and not continuous. **Analysis of Incorrect Options:** * **Option A:** In a bar chart, the **height/length** of the bar is proportional to the value, not the width. The width is arbitrary and must be uniform for all bars to avoid visual bias. * **Option B:** Bar charts are used for **qualitative data** (e.g., gender, blood groups, types of anemia). **Quantitative data** (e.g., height, weight, BP) is typically represented using histograms or line diagrams. * **Option C:** A bar chart is **not** the same as a histogram. Histograms are used for continuous quantitative data, and the bars touch each other (no gaps), whereas bar charts have distinct gaps between bars. **High-Yield Clinical Pearls for NEET-PG:** * **Types of Bar Charts:** 1. **Simple:** Single variable (e.g., number of cases per state). 2. **Multiple (Grouped):** Comparing two or more variables (e.g., prevalence of DM vs. HTN in different cities). 3. **Proportional (Stacked):** Shows the relative contribution of different components to a whole. * **Memory Aid:** **B**ar = **B**roken (gaps between bars); **H**istogram = **H**eaped (bars touch). * **Most common error:** Confusing the "Area" of a histogram (which represents frequency) with the "Height" of a bar chart.

Q: Which of the following distributions is symmetrical?

Normal distribution. ### Explanation **Correct Answer: A. Normal distribution** The **Normal distribution** (also known as Gaussian distribution) is the cornerstone of biostatistics. It is characterized by a **symmetrical, bell-shaped curve** where the data points are distributed evenly around the center. In a perfectly normal distribution, the **Mean, Median, and Mode are all equal** and coincide at the peak of the curve. This symmetry implies that 50% of the values lie above the mean and 50% lie below it. **Why the other options are incorrect:** * **B. Bimodal distribution:** This distribution has **two distinct peaks** (modes). While it can occasionally be symmetrical, it is defined by its two peaks rather than its symmetry. In medicine, this often represents two different populations (e.g., Hodgkin lymphoma incidence peaks). * **C. Skewed distribution:** By definition, these are asymmetrical. In **Positively skewed** (right-skewed) distributions, the tail is longer on the right (Mean > Median > Mode). In **Negatively skewed** (left-skewed) distributions, the tail is longer on the left (Mode > Median > Mean). * **D. U-shaped distribution:** This has peaks at both ends and a dip in the middle. While it can be symmetrical, it does not follow the standard "symmetrical distribution" properties used in parametric testing. **High-Yield Clinical Pearls for NEET-PG:** * **Standard Normal Curve:** Has a Mean = 0 and Standard Deviation (SD) = 1. * **68-95-99.7 Rule:** In a normal distribution, 68% of values fall within ±1 SD, 95% within ±2 SD (precisely 1.96 SD), and 99.7% within ±3 SD. * **Parametric Tests:** These (like the t-test and ANOVA) assume that the data follows a normal distribution. If the data is skewed, non-parametric tests must be used.

Q: When height and weight are perfectly positively correlated, the coefficient of correlation is:

1. **Explanation:** The **Correlation Coefficient (r)**, also known as Pearson’s correlation coefficient, is a statistical measure that quantifies the strength and direction of a linear relationship between two continuous variables (e.g., height and weight). **Why Option A is correct:** The value of 'r' ranges strictly from **-1 to +1**. * **"Positive correlation"** means that as one variable increases, the other also increases. * **"Perfect correlation"** means that all data points lie exactly on a straight line. Therefore, a **perfectly positive correlation** is represented by a value of **+1**. In this scenario, every unit increase in height would correspond to a fixed, predictable increase in weight. **Why other options are incorrect:** * **Option B (-1):** This represents a **perfect negative correlation**, where one variable increases as the other decreases (e.g., as age increases, lung function/FEV1 might decrease). * **Option C (0):** This indicates **zero correlation**, meaning there is no linear relationship between the variables (e.g., height and blood group). * **Option D (More than 1):** This is mathematically impossible. The coefficient of correlation can never exceed 1 or be less than -1. **High-Yield Clinical Pearls for NEET-PG:** 1. **Coefficient of Determination ($r^2$):** This represents the proportion of variance in one variable that is predictable from the other. If $r = 0.8$, then $r^2 = 0.64$ (64% of the change is explained). 2. **Scatter Diagram:** This is the best visual method to represent correlation. A straight line rising from left to right indicates $r = +1$. 3. **Correlation vs. Causation:** A high correlation coefficient does **not** necessarily imply a cause-and-effect relationship.

Q: In a normal distribution curve, what percentage of the area lies between one standard deviation on either side of the mean?

68%. ### Explanation **Concept: The Normal (Gaussian) Distribution** In Biostatistics, the Normal Distribution is a symmetrical, bell-shaped curve where the mean, median, and mode coincide at the center. The spread of data around the mean is measured by the **Standard Deviation (SD)**. According to the **Empirical Rule** (also known as the 68-95-99.7 rule), fixed percentages of data fall within specific SD ranges from the mean: * **Mean ± 1 SD:** Covers approximately **68.2%** of the total area. * **Mean ± 2 SD:** Covers approximately **95.4%** of the total area. * **Mean ± 3 SD:** Covers approximately **99.7%** of the total area. Therefore, **Option B** is the correct answer as it represents the area within one standard deviation. **Analysis of Incorrect Options:** * **Option A (62%):** This value does not correspond to any standard milestone in a normal distribution curve. * **Option C (90%):** While 90% is a common confidence interval level, it corresponds to ± 1.64 SD, not a single whole SD. * **Option D (99%):** This is close to the area covered by ± 3 SD (99.7%). A range of ± 2.58 SD specifically covers 99% of the area. **High-Yield Clinical Pearls for NEET-PG:** * **Z-score:** This indicates how many standard deviations a value is from the mean. A Z-score of +1 means the value is 1 SD above the mean. * **Standard Normal Distribution:** A special case where the **Mean is 0** and the **SD is 1**. * **Skewness:** If the curve is not symmetrical, it is "skewed." If the tail is longer on the right, it is **Positively Skewed** (Mean > Median > Mode). If the tail is longer on the left, it is **Negatively Skewed** (Mode > Median > Mean). * **Precision vs. Accuracy:** SD is a measure of precision (reliability); the smaller the SD, the more precise the data.

Q: In clinical trials, what is the major purpose of randomization?

To reduce selection bias in allocation to treatment. ### Explanation **Correct Option: A (To reduce selection bias in allocation to treatment)** Randomization is the "heart" of a Randomized Controlled Trial (RCT). Its primary statistical purpose is to **eliminate selection bias** by ensuring that the assignment of participants to either the treatment or control group is determined purely by chance, rather than the investigator's conscious or subconscious preference. This ensures that every participant has an equal opportunity of being assigned to any group. **Analysis of Incorrect Options:** * **Option B:** Blinding (Masking) is the technique used to reduce performance and detection bias. While randomization *facilitates* blinding (especially double-blinding), it is not the primary purpose of the randomization process itself. * **Option C:** While randomization does help in balancing baseline characteristics (both known and unknown confounders), this is a **secondary benefit**. The fundamental procedural goal is the unbiased allocation of subjects. * **Option D:** Representativeness of the general population is achieved through **Random Sampling** (External Validity), not Randomization. Randomization deals with **Internal Validity** (how participants are split *within* the study). **High-Yield Clinical Pearls for NEET-PG:** * **Confounding:** Randomization is the only method that controls for both **known and unknown confounders**. * **Selection Bias:** Prevented by Randomization + Allocation Concealment. * **Observation/Measurement Bias:** Prevented by Blinding. * **Gold Standard:** The RCT is the gold standard for evaluating the efficacy of a new drug or intervention. * **Sequence Generation:** Common methods include computer-generated random numbers or random number tables. (Note: Alternating patients or using Date of Birth is "Quasi-randomization" and is prone to bias).

Q: Nine families surveyed have 1, 2, 2, 2, 3, 4, 4, 6, 7 children respectively. What are the mean, median, and mode, respectively?

3.4, 3, 2. ### Explanation This question tests the fundamental understanding of **measures of central tendency**, which are essential in biostatistics for summarizing epidemiological data. **1. Calculation of the Correct Answer (Option D):** * **Mean (Arithmetic Average):** Sum of all observations divided by the number of observations. * Sum = $1 + 2 + 2 + 2 + 3 + 4 + 4 + 6 + 7 = 31$ * Mean = $31 / 9 = \mathbf{3.44}$ * **Median (Middle Value):** The middle value when data is arranged in ascending order. * Data: 1, 2, 2, 2, **3**, 4, 4, 6, 7. * Since $n=9$ (odd), the median is the $(\frac{n+1}{2})^{th}$ value, which is the $5^{th}$ value. * Median = $\mathbf{3}$ * **Mode (Most Frequent Value):** The value that appears most frequently in the dataset. * The number '2' appears three times, more than any other number. * Mode = $\mathbf{2}$ **2. Why Other Options are Incorrect:** * **Option A & B:** These incorrectly identify the mode as 3. While 3 is the median, it only appears once, whereas 2 appears thrice. * **Option C:** This assumes all three measures are equal. This only occurs in a perfectly symmetrical **Normal Distribution** (Bell Curve). **3. High-Yield Clinical Pearls for NEET-PG:** * **Sensitivity to Outliers:** The **Mean** is the most sensitive to extreme values (outliers). In skewed distributions, the **Median** is the preferred measure of central tendency. * **Relationship in Skewed Data:** * **Positively Skewed:** Mean > Median > Mode (Tail to the right). * **Negatively Skewed:** Mode > Median > Mean (Tail to the left). * **Note:** In this specific dataset, Mean (3.4) > Median (3) > Mode (2), indicating the data is **positively skewed**.

Q: All of the following are true regarding standard error of the mean, except?

It increases with an increased number of samples.. ### Explanation **Standard Error of the Mean (SEM)** is a measure of the dispersion of sample means around the true population mean. It indicates how much the sample mean is likely to vary from the actual population mean. **1. Why Option A is the Correct Answer (The False Statement):** The formula for Standard Error is: **$SEM = \frac{SD}{\sqrt{n}}$** (where $SD$ is Standard Deviation and $n$ is sample size). Mathematically, SEM is **inversely proportional** to the square root of the sample size. Therefore, as the number of samples ($n$) increases, the SEM **decreases**, making the estimate of the population mean more precise. The statement that it "increases" is incorrect. **2. Analysis of Other Options:** * **Option B:** SEM is derived from the **Sampling Distribution of the Mean**, which follows a Normal Distribution (Central Limit Theorem), even if the underlying population is not perfectly normal. * **Option C:** SEM is used to calculate **Confidence Intervals (CI)**. For example, the 95% Confidence Limit is calculated as $Mean \pm (1.96 \times SEM)$. * **Option D:** SEM is technically the **Standard Deviation of the sampling distribution**. While it differs from the SD of a single sample, it represents the "standard deviation of the means." **3. High-Yield Clinical Pearls for NEET-PG:** * **SD vs. SEM:** Use **SD** to describe the variability of individual observations within a single sample. Use **SEM** to describe the precision of the sample mean compared to the population mean. * **Precision:** A smaller SEM indicates higher precision of the study. * **Relationship:** $SEM$ is always smaller than $SD$ (provided $n > 1$). * **Sample Size Impact:** To reduce the SEM by half, you must increase the sample size fourfold (due to the square root relationship).

Q: What is the validity of a test?

Accuracy. ### Explanation **Validity** refers to the ability of a screening or diagnostic test to measure what it is intended to measure. In biostatistics, the hallmark of validity is **Accuracy**, which indicates how close the test result is to the "True Value" (usually determined by a Gold Standard test). Validity has two main components: **Sensitivity** and **Specificity**. #### Why Accuracy is Correct: Accuracy represents the proportion of correct results (both true positives and true negatives) out of the total tests performed. A valid test must be accurate; if a test consistently gives results far from the true status of the disease, it lacks validity. #### Why Other Options are Incorrect: * **Precision (Option A):** This refers to the consistency of the results when the test is repeated. A test can be highly precise (giving the same result every time) but still be invalid if it consistently gives the *wrong* result. * **Reproducibility and Repeatability (Options B & C):** These are synonyms for **Reliability** or **Precision**. They measure the degree of agreement between repeated measurements under the same conditions. While a good test should be both valid and reliable, reliability does not guarantee validity. --- ### High-Yield Clinical Pearls for NEET-PG: * **Validity = Accuracy:** Measured by Sensitivity and Specificity. * **Reliability = Precision/Reproducibility:** Measured by Variation (Observer, Biological, or Instrumental). * **The Bullseye Analogy:** * Hits the center consistently = **Valid and Reliable**. * Hits the same spot away from the center = **Reliable but not Valid**. * Hits all over the target = **Neither Valid nor Reliable**. * **Sensitivity** is the ability of a test to correctly identify those **with** the disease (True Positive Rate). * **Specificity** is the ability of a test to correctly identify those **without** the disease (True Negative Rate).

Question 1

Which graphical representation best represents the number of dengue cases in Delhi from 2000 to 2015?

Accepted Answer

Line chart

Answer

Histogram

Answer

Scatter diagram

Answer

Bar chart

Question 2

Which of the following statements is true about bar charts?

Accepted Answer

Rectangular bars are used to represent the data.

Answer

The width of the bar is proportional to the representative values.

Answer

Used for quantitative data.

Answer

Same as a histogram.

Question 3

Which of the following distributions is symmetrical?

Accepted Answer

Normal distribution

Answer

Bimodal distribution

Answer

Skewed distribution

Answer

U-shaped distribution

Question 4

When height and weight are perfectly positively correlated, the coefficient of correlation is:

Accepted Answer

1

Answer

-1

Answer

0

Answer

More than 1

Question 5

In a normal distribution curve, what percentage of the area lies between one standard deviation on either side of the mean?

Accepted Answer

68%

Answer

62%

Answer

90%

Answer

99%

Question 6

In maternal and child welfare programs, what sampling method is used?

Accepted Answer

Cluster

Answer

Systematic

Answer

Group

Answer

Stratified

Question 7

In clinical trials, what is the major purpose of randomization?

Accepted Answer

To reduce selection bias in allocation to treatment

Answer

To facilitate double blinding

Answer

To ensure groups are comparable on baseline characteristics

Answer

To help ensure the study subjects are representative of the general population

Question 8

Nine families surveyed have 1, 2, 2, 2, 3, 4, 4, 6, 7 children respectively. What are the mean, median, and mode, respectively?

Accepted Answer

3.4, 3, 2

Answer

3.4, 2, 3

Answer

3, 2, 3

Answer

3, 3, 3

Question 9

All of the following are true regarding standard error of the mean, except?

Accepted Answer

It increases with an increased number of samples.

Answer

It is based on the normal distribution curve.

Answer

It measures the confidence limit.

Answer

It is the standard deviation.

Question 10

What is the validity of a test?

Accepted Answer

Accuracy

Answer

Precision

Answer

Reproducibility

Answer

Repeatability

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?