Biostatistics Practice Questions

Q: Which of the following study designs does NOT involve hypothesis testing?

Descriptive studies. **Explanation:** The core distinction between study designs in epidemiology lies in whether they **generate** or **test** a hypothesis. **Why Descriptive Studies are the Correct Answer:** Descriptive studies (e.g., Case Reports, Case Series, and Cross-sectional surveys) are the first step in an epidemiological investigation. They focus on describing the distribution of a disease in terms of **Time, Place, and Person**. Their primary goal is to **formulate or generate a hypothesis** rather than test it. Since there is no comparison group in a descriptive study, statistical hypothesis testing (calculating p-values to determine association) cannot be performed. **Why the Other Options are Incorrect:** * **B. Analytical Studies:** This is a broad category that includes Case-control and Cohort studies. The fundamental purpose of any analytical study is to **test a hypothesis** by comparing two or more groups to determine if an exposure is statistically associated with an outcome. * **C. Case-control Studies:** These are retrospective analytical studies that compare "cases" (with disease) to "controls" (without disease) to test the hypothesis that a specific risk factor led to the outcome. * **D. Cohort Studies:** These are prospective or retrospective analytical studies that follow a group over time to test the hypothesis that an exposure leads to the development of a disease. **NEET-PG High-Yield Pearls:** * **Descriptive Studies:** Generate hypotheses (Who, Where, When?). * **Analytical Studies:** Test hypotheses (Why, How?). * **Experimental Studies (RCTs):** Confirm hypotheses and establish the highest level of causality. * **Sequence of Investigation:** Descriptive $\rightarrow$ Analytical $\rightarrow$ Experimental. * **Unit of Study:** In Ecological studies (a type of descriptive/analytical hybrid), the unit of study is a **population**, not an individual.

Q: All of the following are true regarding standard error of the mean, except?

It increases with an increased number of samples.. ### Explanation **Standard Error of the Mean (SEM)** is a measure of the dispersion of sample means around the true population mean. It indicates how much the sample mean is likely to vary from the actual population mean. **1. Why Option A is the Correct Answer (The False Statement):** The formula for Standard Error is: **$SEM = \frac{SD}{\sqrt{n}}$** (where $SD$ is Standard Deviation and $n$ is sample size). Mathematically, SEM is **inversely proportional** to the square root of the sample size. Therefore, as the number of samples ($n$) increases, the SEM **decreases**, making the estimate of the population mean more precise. The statement that it "increases" is incorrect. **2. Analysis of Other Options:** * **Option B:** SEM is derived from the **Sampling Distribution of the Mean**, which follows a Normal Distribution (Central Limit Theorem), even if the underlying population is not perfectly normal. * **Option C:** SEM is used to calculate **Confidence Intervals (CI)**. For example, the 95% Confidence Limit is calculated as $Mean \pm (1.96 \times SEM)$. * **Option D:** SEM is technically the **Standard Deviation of the sampling distribution**. While it differs from the SD of a single sample, it represents the "standard deviation of the means." **3. High-Yield Clinical Pearls for NEET-PG:** * **SD vs. SEM:** Use **SD** to describe the variability of individual observations within a single sample. Use **SEM** to describe the precision of the sample mean compared to the population mean. * **Precision:** A smaller SEM indicates higher precision of the study. * **Relationship:** $SEM$ is always smaller than $SD$ (provided $n > 1$). * **Sample Size Impact:** To reduce the SEM by half, you must increase the sample size fourfold (due to the square root relationship).

Q: All of the following are true regarding increasing sample size except?

Decreases power of the test. ### Explanation The relationship between sample size and statistical parameters is a high-yield concept in biostatistics. Increasing the sample size ($n$) generally improves the precision and reliability of a study. **Why Option A is the Correct Answer (The "Except"):** Increasing the sample size **increases** the power of the test, it does not decrease it. * **Power ($1 - \beta$)** is the probability of correctly rejecting a null hypothesis when it is false (detecting a true effect). * As $n$ increases, the study becomes more sensitive to detecting even small differences between groups, thereby increasing the power. **Analysis of Incorrect Options:** * **B. Standard error of the mean (SEM) decreases:** The formula for SEM is $\sigma / \sqrt{n}$. Since $n$ is in the denominator, increasing the sample size mathematically reduces the SEM, leading to more precise estimates. * **C. Decreases the Confidence Interval (CI):** The width of a CI is determined by the SEM ($CI = Mean \pm Z \times SEM$). As SEM decreases with a larger sample size, the CI becomes narrower (more precise). * **D. Decreases alpha error:** Alpha ($\alpha$) error (Type I error) is the probability of rejecting a true null hypothesis. While $\alpha$ is usually preset (e.g., 0.05), a larger sample size reduces the overall "noise" and variability, making the results more robust and reducing the likelihood of a chance finding (false positive). **NEET-PG High-Yield Pearls:** 1. **Sample Size $\propto$ Power:** To detect a smaller effect size, you need a larger sample size. 2. **Sample Size $\propto$ 1/Precision:** Larger samples yield narrower Confidence Intervals. 3. **Type II Error ($\beta$):** Increasing sample size is the most effective way to decrease $\beta$ error. 4. **Law of Large Numbers:** As $n$ increases, the sample mean gets closer to the actual population mean.

Q: A correlation coefficient of +1 indicates which of the following?

A perfect positive correlation. **Explanation:** The **Correlation Coefficient (r)**, also known as Pearson’s ‘r’, is a statistical measure that quantifies the strength and direction of a linear relationship between two quantitative variables. The value of ‘r’ always ranges from **-1 to +1**. 1. **Why the correct answer is right:** A value of **+1** signifies a **perfect positive correlation**. This means that for every unit increase in one variable, there is a proportional increase in the other. On a scatter diagram, all data points would fall exactly on a straight line sloping upwards from left to right. 2. **Why the incorrect options are wrong:** * **Option A & D (Weak/Strong):** These terms describe values between 0 and 1. Generally, 0.1–0.3 is considered weak, 0.4–0.6 is moderate, and 0.7–0.9 is considered a strong correlation. * **Option B (Moderate):** A moderate correlation (e.g., r = +0.5) indicates a visible trend, but the data points are scattered around the regression line rather than sitting perfectly on it. **High-Yield Clinical Pearls for NEET-PG:** * **Direction:** A positive sign (+) means variables move in the same direction; a negative sign (-) means they move in opposite directions (e.g., as age increases, vital capacity decreases). * **Strength:** The closer the value is to 1 (regardless of the sign), the stronger the relationship. * **Zero Correlation (r = 0):** Indicates no linear relationship between the variables. * **Coefficient of Determination ($r^2$):** This represents the proportion of variance in one variable that is predictable from the other. If r = 0.6, then $r^2$ = 0.36 (or 36%).

Q: In the calculation of crude death rate, which point in the year is the population typically considered?

July 1st. ### Explanation **Correct Answer: B. July 1st** In biostatistics and demography, the **Crude Death Rate (CDR)** is defined as the number of deaths per 1,000 population in a given year. The denominator used for this calculation is the **Mid-Year Population**. **Why July 1st?** The population of any region is dynamic, changing daily due to births, deaths, and migration. To represent the average population exposed to the risk of death throughout the entire year, we use the population as it stands on **July 1st** (the exact midpoint of the calendar year). This "Mid-Year Population" acts as an estimate of the average person-years lived by the population during that year. **Analysis of Incorrect Options:** * **A. March 1st:** In India, the National Census (conducted every 10 years) traditionally uses March 1st as the reference date for enumeration. However, for annual vital statistics like CDR, the mid-year estimate is preferred. * **C. April 1st:** This marks the beginning of the financial year in India but holds no specific statistical significance for calculating demographic rates. * **D. August 15th:** While significant as India’s Independence Day, it is not a standard reference point for demographic data. **High-Yield Clinical Pearls for NEET-PG:** * **Mid-Year Population** is the standard denominator for most annual vital rates, including Crude Birth Rate (CBR) and General Fertility Rate (GFR). * **Crude Death Rate** is "crude" because it does not account for the age and sex composition of the population. * **Age-Specific Death Rate** is considered a better indicator of the health status of a specific cohort. * **Standardized Death Rate** is the best tool for comparing mortality between two different populations (e.g., two different states or countries) as it eliminates the bias of age distribution.

Q: Which of the following methods is ideal to ensure similarity between experimental and control groups?

Stratified randomization. **Explanation:** In experimental studies, the goal is to ensure that the study and control groups are as identical as possible, except for the intervention being tested. **1. Why Stratified Randomization is Correct:** While simple randomization ensures that every participant has an equal chance of being assigned to a group, it may fail to balance specific prognostic factors (like age, sex, or disease severity) in smaller samples. **Stratified Randomization** is the ideal method because it first categorizes participants into "strata" based on these important variables and then performs randomization within each stratum. This guarantees that both groups are perfectly balanced for known confounding factors, ensuring maximum similarity. **2. Analysis of Incorrect Options:** * **Randomization (Simple):** This is the "heart" of a clinical trial and eliminates selection bias. However, by chance, it may result in an imbalance of key variables between groups, especially in small studies. * **Matching:** This is primarily used in **Case-Control studies** to eliminate confounding. It is difficult, time-consuming, and can lead to "over-matching." It does not account for unknown confounders, whereas randomization does. * **Cross-over Study:** This is a study design where the same subject serves as their own control (receiving both treatment and placebo at different times). While it ensures perfect similarity, it is a **design type**, not a method used to *create* groups in a standard parallel trial. **High-Yield Pearls for NEET-PG:** * **Randomization** is the best method to control for **unknown confounders**. * **Blinding** is used to eliminate **ascertainment (observer) bias**. * **Stratification** is used to control for **known confounders** at the design stage. * The "Unit of Randomization" in a standard Clinical Trial is the **Individual**, while in a Community Trial, it is the **Group/Community**.

Q: What is an ogive?

Cumulative frequency curve. ### Explanation **Correct Answer: C. Cumulative frequency curve** An **Ogive** (also known as a cumulative frequency polygon) is a graphical representation of the cumulative frequency of a dataset. It is constructed by plotting the cumulative frequencies (either "less than" or "more than" type) against the upper or lower class boundaries. * **Why it is correct:** In biostatistics, while a frequency polygon shows the distribution of data points, the Ogive specifically tracks the **running total**. It is the primary tool used to determine the **Median**, quartiles, and percentiles of a distribution graphically. The point where the "less than" and "more than" ogives intersect corresponds to the Median on the x-axis. **Analysis of Incorrect Options:** * **A. Bar Chart:** Used for **qualitative (categorical)** or discrete data. Bars are separated by spaces. * **B. Histogram:** Used for **continuous quantitative** data. It consists of adjacent rectangles where the area represents the frequency. It is used to find the **Mode** graphically. * **D. Frequency Polygon:** A line graph formed by joining the midpoints of the tops of the bars in a histogram. It represents the frequency distribution of continuous data but does not show cumulative totals. **High-Yield NEET-PG Pearls:** 1. **Median** is determined by the **Ogive**. 2. **Mode** is determined by the **Histogram**. 3. **Mean** cannot be determined graphically; it must be calculated. 4. **Normal Distribution:** In a perfectly symmetrical bell-shaped curve, the Mean, Median, and Mode coincide at the same point. 5. **Scatter Diagram:** Used to show the **correlation** (relationship) between two continuous variables.

Q: All of the following are examples of a nominal scale except?

Body weight. To master Biostatistics for NEET-PG, it is essential to distinguish between the four levels of measurement: **Nominal, Ordinal, Interval, and Ratio.** ### **Why "Body Weight" is the Correct Answer** **Body weight** is a **Ratio Scale** (a type of quantitative/numerical data). Unlike nominal scales, it has a natural order, equal intervals between values, and a **true zero point** (0 kg means the absence of weight). Because it represents a measurable quantity rather than a descriptive category, it is not a nominal scale. ### **Analysis of Other Options** * **A. Race:** This is a **Nominal Scale**. It categorizes individuals into groups (e.g., Caucasian, Asian, African) based on names or labels. There is no inherent mathematical order or ranking between these groups. * **B. Sex:** This is a **Nominal Scale** (specifically a dichotomous/binary scale). Male and female are distinct categories with no quantitative value or rank. * **D. Socio-economic status:** This is typically an **Ordinal Scale** (e.g., Upper, Middle, Lower class). While it is qualitative like a nominal scale, it has a specific **rank or order**. However, in the context of this question, it is still a "categorical" variable and definitely not a "ratio" scale like body weight, making body weight the most distinct outlier. ### **High-Yield Clinical Pearls for NEET-PG** * **NOIR Mnemonic:** **N**ominal (Labels), **O**rdinal (Order/Rank), **I**nterval (No true zero, e.g., Temperature in Celsius), **R**atio (True zero, e.g., BP, Pulse, Height). * **Nominal Data:** The only central tendency measure applicable is the **Mode**. * **Ordinal Data:** Examples include Pain Scales (VAS), Cancer Staging (TNM), and Likert Scales. The **Median** is the preferred measure of central tendency. * **Ratio Data:** This is the "highest" level of measurement and allows for the most complex statistical tests.

Question 1

Which of the following study designs does NOT involve hypothesis testing?

Accepted Answer

Descriptive studies

Answer

Analytical studies

Answer

Case control studies

Answer

Cohort studies

Question 2

All of the following are true regarding standard error of the mean, except?

Accepted Answer

It increases with an increased number of samples.

Answer

It is based on the normal distribution curve.

Answer

It measures the confidence limit.

Answer

It is the standard deviation.

Question 3

The correlation between Infant Mortality Rate (IMR) and Socioeconomic Status is best depicted by which of the following values?

Accepted Answer

Correlation coefficient of -0.8

Answer

Correlation coefficient of +1

Answer

Correlation coefficient of +0.5

Answer

Correlation coefficient of -1

Question 4

All of the following are true regarding increasing sample size except?

Accepted Answer

Decreases power of the test

Answer

Standard error of the mean decreases

Answer

Decreases the Confidence Interval

Answer

Decreases alpha error

Question 5

A correlation coefficient of +1 indicates which of the following?

Accepted Answer

A perfect positive correlation

Answer

A very weak positive correlation

Answer

A moderate positive correlation

Answer

A strong positive correlation

Question 6

In the calculation of crude death rate, which point in the year is the population typically considered?

Accepted Answer

July 1st

Answer

March 1st

Answer

April 1st

Answer

August 15th

Question 7

Which of the following methods is ideal to ensure similarity between experimental and control groups?

Accepted Answer

Stratified randomization

Answer

Randomization

Answer

Matching

Answer

Cross-over study

Question 8

What is an ogive?

Accepted Answer

Cumulative frequency curve

Answer

Bar chart

Answer

Histogram

Answer

Frequency polygon

Question 9

All of the following are examples of a nominal scale except?

Accepted Answer

Body weight

Answer

Race

Answer

Sex

Answer

Socio-economic status

Question 10

What type of epidemiological study best establishes the temporal association of a disease?

Accepted Answer

Cohort study

Answer

Case-Control study

Answer

Cross-sectional study

Answer

Descriptive study

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?