Biostatistics Practice Questions

Q: A randomized trial comparing the efficacy of two drugs showed a difference between the two (p value < 0.05). However, in reality the drugs do not differ. This is an example of

Type I error. ***Type I error*** - A **Type I error** occurs when the **null hypothesis is incorrectly rejected**, leading to the conclusion that a significant difference exists when, in reality, there is no true difference. - In this scenario, the trial concluded a difference (p < 0.05), but the drugs are truly equivalent, which is precisely the definition of a **Type I error**. *Both type I and II error* - It is impossible to commit both a **Type I** and a **Type II error** simultaneously for the same statistical test. - A **Type I error** involves rejecting a true null hypothesis, while a **Type II error** involves failing to reject a false null hypothesis. *Random error* - **Random error** refers to unpredictable fluctuations in measurements or results, which can be minimized but not eliminated. - While random error can contribute to variability in data, it is not the direct statistical error of concluding a non-existent difference when analyzing the results, which is a **Type I error**. *Type II error* - A **Type II error** occurs when the **null hypothesis is incorrectly accepted** (or not rejected), meaning a real difference exists but the study fails to detect it. - This scenario describes the opposite: a difference was detected and concluded, but it was false.

Q: In a village, every fifth house was selected for a study. This is an example of

Systematic random sampling. ***Systematic random sampling*** - This method involves selecting subjects from a **ordered sampling frame** at regular intervals, such as every k-th item. - In this scenario, selecting every fifth house represents a fixed interval (k=5), which is characteristic of systematic random sampling. *Simple random sampling* - This method ensures that every member of the population has an **equal chance of being selected**, often through random number generation. - It does not involve a predetermined, fixed interval of selection from an ordered list. *Convenience sampling* - This technique involves selecting subjects who are **easily accessible or readily available**, without any systematic or random process. - It is prone to bias as it does not represent the entire population. *Stratified random sampling* - This method involves dividing the population into **homogeneous subgroups (strata)** and then conducting simple random sampling within each stratum. - The scenario does not describe dividing the village households into distinct subgroups before selection.

Q: The ability of a test to identify correctly those who do not have the disease is called its

Specificity. ***Specificity*** - **Specificity** is the proportion of **true negatives** correctly identified by the test. - It measures the ability of a test to correctly identify individuals who **do not have the disease**. *Sensitivity* - **Sensitivity** is the proportion of **true positives** correctly identified by the test. - It measures the ability of a test to correctly identify individuals who **do have the disease**. *Positive predictive value* - **Positive predictive value (PPV)** is the probability that a patient with a **positive test result** actually has the disease. - It depends on the **prevalence** of the disease in the population being tested. *Negative predictive value* - **Negative predictive value (NPV)** is the probability that a patient with a **negative test result** actually does not have the disease. - It also depends on the **prevalence** of the disease in the population.

Q: In a normal curve, how much per cent of the values will be included in the area between two standard deviations on either side of the mean (X ± 2σ) ?

95.4. ***Correct: 95.4*** - According to the **empirical rule** (also known as the 68-95-99.7 rule), approximately 95% of data falls within two standard deviations of the mean in a normal distribution. - More precisely, the area between **X ± 2σ** encompasses **95.4%** of the values. - This is a fundamental concept in biostatistics used for calculating confidence intervals and reference ranges. *Incorrect: 68.3* - This percentage represents the proportion of data within **one standard deviation** (X ± 1σ) of the mean in a normal distribution. - It is not the correct value for the range of two standard deviations. *Incorrect: 90.4* - This value does not correspond to any standard interval of standard deviations around the mean in a normal distribution. - It is not part of the empirical rule for common standard deviation ranges. *Incorrect: 99.7* - This percentage represents the proportion of data within **three standard deviations** (X ± 3σ) of the mean in a normal distribution. - It is a larger interval than what is asked in the question (two standard deviations).

Q: "Sampling error" occurs due to the variation in results

between one sample and another. ***between one sample and another*** - **Sampling error** arises because a sample is not a perfect representation of the entire population from which it is drawn. - This error quantifies the natural **variability** that occurs when different subgroups (samples) are selected from the same population. *due to the use of many instruments in the study* - This scenario describes **inter-instrument variability** or **measurement error**, which is related to the precision and calibration of different tools. - While it can introduce error, it is distinct from sampling error, which arises from the representativeness of the chosen study subjects. *due to the multiple readings taken on the same instrument* - Multiple readings on the same instrument assess **intra-instrument variability** or **repeatability**, indicating how consistent a single instrument is over time. - This relates to the precision of the measurement device, not the representativeness of the sample itself. *between the observations of two individuals* - Differences in observations between two individuals indicate **inter-rater variability** or **observer bias**. - This type of error is related to subjective interpretation or measurement technique by different observers, rather than the intrinsic variability between selected samples.

Q: Which is the fertility indicator that gives the approximate magnitude of completed family size?

Total Fertility Rate. ***Total Fertility Rate*** - The **Total Fertility Rate (TFR)** estimates the average number of children a woman would have over her lifetime if she were to experience current age-specific fertility rates. - It provides a measure of the **completed family size** in a hypothetical cohort of women. *Age-specific Fertility Rate* - The **Age-specific Fertility Rate (ASFR)** measures the number of births to women in a particular age group per 1,000 women in that age group. - It does not directly provide the completed family size but is a component used to calculate the TFR. *General Fertility Rate* - The **General Fertility Rate (GFR)** calculates the number of live births per 1,000 women of childbearing age (typically 15-49 years) in a given year. - While it reflects overall fertility, it does not provide an estimate of the completed family size per woman. *Gross Reproduction Rate* - The **Gross Reproduction Rate (GRR)** is similar to the TFR but only considers female births. - It estimates the average number of daughters a woman would have over her lifetime based on current age-specific fertility rates, without accounting for mortality.

Q: In a normal distribution, Mean ± 2 S.D. contains

95.4 % values. ***95.4 % values*** - According to the **empirical rule** (or 68-95-99.7 rule) for normal distributions, approximately **95.4%** of data falls within two standard deviations of the mean. - This interval covers from (Mean - 2 S.D.) to (Mean + 2 S.D.) and represents the likelihood of a value falling in this range. *68.3 % values* - This percentage corresponds to the data contained within **Mean ± 1 S.D.** in a normal distribution, not Mean ± 2 S.D. - It signifies that roughly two-thirds of all observations lie within one standard deviation from the mean in a bell-shaped curve. *91.2 % values* - This value is not a standard percentage associated with common multiples of standard deviations (1, 2, or 3) from the mean in a normal distribution. - It does not correspond to any universally recognized interval like ±1 S.D., ±2 S.D., or ±3 S.D. *99.7 % values* - This percentage represents the data contained within **Mean ± 3 S.D.** in a normal distribution. - It indicates that almost all (99.7%) of the data points are expected to fall within three standard deviations from the mean.

Q: The appropriate statistical test to find out obesity as a significant risk factor for breast cancer is:

Chi-square test. ***Chi-square test*** - The **chi-square test** is used to determine if there is a **significant association** between two **categorical variables**. - In this scenario, both obesity (yes/no) and breast cancer (yes/no) are categorical, making chi-square appropriate to assess if obesity is a risk factor. *Wilcoxon’s signed rank test* - This is a **non-parametric test** used for comparing two related samples or repeated measurements on a single sample, especially when data are not normally distributed. - It is not suitable for assessing the association between two independent categorical variables like obesity and breast cancer. *Student’s paired ‘t’ test* - The **paired t-test** is used to compare the means of two related groups or measurements from the same subject under different conditions (e.g., before and after an intervention). - This test is designed for **continuous data** and would not be appropriate for the categorical variables of obesity and breast cancer. *Student’s unpaired ‘t’ test* - The **unpaired t-test** (also known as independent samples t-test) is used to compare the means of two independent groups for a **continuous outcome variable**. - It is not suitable when both the exposure (obesity) and the outcome (breast cancer) are categorical variables.

Q: Which of the following is/are the measure(s) of dispersion? 1. Mode 2. Median 3. Standard Deviation Select the correct answer using the code given below:

3 only. ***Correct: 3 only*** - **Standard Deviation** is a direct measure of dispersion that quantifies the amount of variation or spread of data values around the mean - It indicates how much individual data points deviate from the average, making it a key statistic for understanding the **spread** within a dataset - Other common measures of dispersion include **range, variance, interquartile range, and coefficient of variation** *Incorrect: 1, 2 and 3* - **Mode** and **Median** are measures of **central tendency**, not dispersion - They describe the center or typical value of a dataset, not the spread or variability - While they provide insight into the data's distribution, they do not quantify how spread out the data points are *Incorrect: 2 and 3 only* - **Median** is a measure of **central tendency** representing the middle value when data is ordered, not a measure of dispersion - Only **Standard Deviation** from this option is a measure of dispersion, making this choice incorrect *Incorrect: 1 and 2 only* - Both **Mode** and **Median** are measures of **central tendency** - Mode indicates the most frequent value and Median represents the middle value - Neither provides information about how **spread out** or dispersed the data points are around the center

Q: Which one of the following is not a measure of dispersion?

Mean. ***Mean*** - The **mean** is a measure of **central tendency**, representing the average value of a dataset. - It describes the typical value around which data points cluster, rather than how spread out they are. *Mean deviation* - **Mean deviation** is a measure of **dispersion** that calculates the average of the absolute differences between each data point and the mean of the dataset. - It quantifies the average deviation of data points from the center. *Standard deviation* - **Standard deviation** is a widely used measure of **dispersion** that indicates the average amount of variability or spread around the mean. - A higher standard deviation suggests data points are more spread out from the mean. *Range* - The **range** is a simple measure of **dispersion** calculated as the difference between the highest and lowest values in a dataset. - It provides a basic idea of the total spread of data from its extremes.

Question 1

A randomized trial comparing the efficacy of two drugs showed a difference between the two (p value < 0.05). However, in reality the drugs do not differ. This is an example of

Accepted Answer

Type I error

Answer

Both type I and II error

Answer

Random error

Answer

Type II error

Question 2

In a village, every fifth house was selected for a study. This is an example of

Accepted Answer

Systematic random sampling

Answer

Simple random sampling

Answer

Convenience sampling

Answer

Stratified random sampling

Question 3

The ability of a test to identify correctly those who do not have the disease is called its

Accepted Answer

Specificity

Answer

Sensitivity

Answer

Positive predictive value

Answer

Negative predictive value

Question 4

In a normal curve, how much per cent of the values will be included in the area between two standard deviations on either side of the mean (X ± 2σ) ?

Accepted Answer

95.4

Answer

68.3

Answer

90.4

Answer

99.7

Question 5

"Sampling error" occurs due to the variation in results

Accepted Answer

between one sample and another

Answer

due to the use of many instruments in the study

Answer

due to the multiple readings taken on the same instrument

Answer

between the observations of two individuals

Question 6

Which is the fertility indicator that gives the approximate magnitude of completed family size?

Accepted Answer

Total Fertility Rate

Answer

Age-specific Fertility Rate

Answer

General Fertility Rate

Answer

Gross Reproduction Rate

Question 7

In a normal distribution, Mean ± 2 S.D. contains

Accepted Answer

95.4 % values

Answer

68.3 % values

Answer

91.2 % values

Answer

99.7 % values

Question 8

The appropriate statistical test to find out obesity as a significant risk factor for breast cancer is:

Accepted Answer

Chi-square test

Answer

Wilcoxon’s signed rank test

Answer

Student’s paired ‘t’ test

Answer

Student’s unpaired ‘t’ test

Question 9

Which of the following is/are the measure(s) of dispersion?

1. Mode

2. Median

3. Standard Deviation
Select the correct answer using the code given below:

Accepted Answer

3 only

Answer

1, 2 and 3

Answer

2 and 3 only

Answer

1 and 2 only

Question 10

Which one of the following is not a measure of dispersion?

Accepted Answer

Mean

Answer

Mean deviation

Answer

Standard deviation

Answer

Range

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?