NEET-PG 2024 — Biostatistics

Q: A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?

4.5. ***4.5*** - The given survival times are already ordered: 2, 3, 4, 4, 5, 6, 7, 8. - Since there is an **even number of observations (n=8)**, the median is the average of the two middle values, which are the 4th and 5th values. (4 + 5) / 2 = **4.5**. *3.5* - This value would result from incorrectly averaging the 3rd and 4th observations (3 + 4) / 2 = 3.5. - This error occurs when miscounting the middle positions in an even-numbered dataset. *4.0* - This value represents the **fourth observation** in the ordered list, not the true median for an even number of data points. - While it is one of the middle values, the median for an even dataset requires averaging the two middle-most values. *5.0* - This value represents the **fifth observation** in the ordered list, not the true median for an even number of data points. - It would be the median if the dataset contained an odd number of observations and 5 was the middle term. *5.5* - This value would be the mean of 5 and 6, which are the 5th and 6th values, not the correct middle values. - This calculation does not represent the correct methodology for finding the median in this dataset. [Practice more Biostatistics PYQs on OnCourse]

Q: A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

Paired t-test. ***Paired t-test*** - A **paired t-test** is appropriate for comparing means from two related samples, such as "before" and "after" measurements on the **same individuals**. - It assesses whether there is a statistically significant difference between these **dependent observations**. *Independent t-test* - The independent t-test compares means between **two separate groups** (unrelated samples). - It is inappropriate here because we have **paired data** from the same individuals measured twice, not two independent groups. *McNemar test* - The McNemar test is used for comparing **paired nominal data**, typically in a 2×2 table, for example, before-after changes in a proportion or categorical outcome. - It is not suitable for **continuous data** like cholesterol levels. *Chi-square test* - The chi-square test is used to assess the association between **two categorical variables** or to compare observed frequencies with expected frequencies. - It is not designed for comparing means of **continuous variables** in paired samples. *Wilcoxon signed-rank test* - The Wilcoxon signed-rank test is a **non-parametric alternative to the paired t-test**, used when the data are not normally distributed or when the sample size is small. - While it's used for paired data, the paired t-test is generally preferred when parametric assumptions (like **normality**) can be met, especially with a sample size of 80. [Practice more Biostatistics PYQs on OnCourse]

Q: A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?

Chi-square test. ***Chi-square test*** - The **chi-square test** is appropriate when analyzing the relationship between two **categorical variables**. In this scenario, "diet type" (vegetarian/non-vegetarian) and "presence of disease" (yes/no) are both categorical variables. - This test determines if there is a statistically significant association between the frequency counts of these two variables in a contingency table. *T-test* - A **t-test** is used to compare the **means** of two groups, typically when the dependent variable is continuous. - This test is unsuitable here because the presence of disease and diet type are categorical, not continuous, variables. *ANOVA* - **ANOVA** (Analysis of Variance) is used to compare the **means** of three or more groups, often with a continuous dependent variable. - Similar to the t-test, ANOVA is not applicable as the study involves categorical variables, not the comparison of means across multiple groups. *Fisher's exact test* - **Fisher's exact test** is similar to the chi-square test but specifically used for **small sample sizes** where the expected frequencies in any cell of the contingency table are less than 5. - While it analyzes categorical data, the chi-square test is the more general and commonly preferred test for larger sample sizes, which is generally assumed unless otherwise specified. *Mann-Whitney U test* - The **Mann-Whitney U test** is a non-parametric test used to compare differences between two independent groups when the dependent variable is **ordinal or continuous** but not normally distributed. - This test is not appropriate for analyzing the association between two categorical variables, as it requires at least one variable to have ranked or continuous data. [Practice more Biostatistics PYQs on OnCourse]

Q: A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?

Stratified random sampling. ***Stratified random sampling*** - This method involves dividing the population into **distinct subgroups (strata)** based on shared characteristics (in this case, humans, animals, and plants), and then performing a simple random sample within each stratum. - This ensures that all subgroups are proportionally represented in the sample, which is appropriate when studying effects across different biological categories. *Simple random sampling* - This method involves selecting individuals from the entire population **purely by chance**, without first dividing them into subgroups. - It would not guarantee representation from all three distinct groups (humans, animals, and plants), which is essential for studying differential effects. *Systematic sampling* - This involves selecting samples at **regular intervals** from an ordered list or sequence. - This method is not suitable here because the population is divided into distinct, non-ordered groups rather than a continuous sequence. *Cluster sampling* - This method involves dividing the population into **clusters**, then randomly selecting some clusters and sampling all individuals within those selected clusters. - In this scenario, the initial groups (humans, animals, plants) are strata, not clusters, as the intent is to sample from within each group, not to treat the groups themselves as primary sampling units. *Convenience sampling* - This is a **non-probability sampling method** where subjects are selected based on ease of access rather than random selection. - The question explicitly states that random selection is performed from each group, ruling out convenience sampling. [Practice more Biostatistics PYQs on OnCourse]

4 Previous Year Questions with Answers & Explanations

Questions

A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?

A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?

A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?

NEET-PG 2024 - Biostatistics NEET-PG Practice Questions and MCQs

Question 1: A study recorded the survival times (in months) of 8 patients diagnosed with pancreatic cancer who received a new chemotherapy regimen. The survival times were: 2, 3, 4, 4, 5, 6, 7, 8 months. What is the median survival time for these patients?

A. 4.0
B. 4.5 (Correct Answer)
C. 5.0
D. 5.5
E. 3.5

Explanation: ***4.5*** - The given survival times are already ordered: 2, 3, 4, 4, 5, 6, 7, 8. - Since there is an **even number of observations (n=8)**, the median is the average of the two middle values, which are the 4th and 5th values. (4 + 5) / 2 = **4.5**. *3.5* - This value would result from incorrectly averaging the 3rd and 4th observations (3 + 4) / 2 = 3.5. - This error occurs when miscounting the middle positions in an even-numbered dataset. *4.0* - This value represents the **fourth observation** in the ordered list, not the true median for an even number of data points. - While it is one of the middle values, the median for an even dataset requires averaging the two middle-most values. *5.0* - This value represents the **fifth observation** in the ordered list, not the true median for an even number of data points. - It would be the median if the dataset contained an odd number of observations and 5 was the middle term. *5.5* - This value would be the mean of 5 and 6, which are the 5th and 6th values, not the correct middle values. - This calculation does not represent the correct methodology for finding the median in this dataset.

Question 2: A group of 80 people is being studied to determine the effect of diet modification on cholesterol levels. To compare the mean cholesterol levels before and after the diet modification in this group, which statistical test should be used?

A. Paired t-test (Correct Answer)
B. McNemar test
C. Chi-square test
D. Wilcoxon signed-rank test
E. Independent t-test

Explanation: ***Paired t-test*** - A **paired t-test** is appropriate for comparing means from two related samples, such as "before" and "after" measurements on the **same individuals**. - It assesses whether there is a statistically significant difference between these **dependent observations**. *Independent t-test* - The independent t-test compares means between **two separate groups** (unrelated samples). - It is inappropriate here because we have **paired data** from the same individuals measured twice, not two independent groups. *McNemar test* - The McNemar test is used for comparing **paired nominal data**, typically in a 2×2 table, for example, before-after changes in a proportion or categorical outcome. - It is not suitable for **continuous data** like cholesterol levels. *Chi-square test* - The chi-square test is used to assess the association between **two categorical variables** or to compare observed frequencies with expected frequencies. - It is not designed for comparing means of **continuous variables** in paired samples. *Wilcoxon signed-rank test* - The Wilcoxon signed-rank test is a **non-parametric alternative to the paired t-test**, used when the data are not normally distributed or when the sample size is small. - While it's used for paired data, the paired t-test is generally preferred when parametric assumptions (like **normality**) can be met, especially with a sample size of 80.

Question 3: A study was undertaken to establish the relationship between the consumption of a vegetarian or non-vegetarian diet and the presence of diseases. Which statistical test should be used?

A. Chi-square test (Correct Answer)
B. T-test
C. ANOVA
D. Fisher's exact test
E. Mann-Whitney U test

Explanation: ***Chi-square test*** - The **chi-square test** is appropriate when analyzing the relationship between two **categorical variables**. In this scenario, "diet type" (vegetarian/non-vegetarian) and "presence of disease" (yes/no) are both categorical variables. - This test determines if there is a statistically significant association between the frequency counts of these two variables in a contingency table. *T-test* - A **t-test** is used to compare the **means** of two groups, typically when the dependent variable is continuous. - This test is unsuitable here because the presence of disease and diet type are categorical, not continuous, variables. *ANOVA* - **ANOVA** (Analysis of Variance) is used to compare the **means** of three or more groups, often with a continuous dependent variable. - Similar to the t-test, ANOVA is not applicable as the study involves categorical variables, not the comparison of means across multiple groups. *Fisher's exact test* - **Fisher's exact test** is similar to the chi-square test but specifically used for **small sample sizes** where the expected frequencies in any cell of the contingency table are less than 5. - While it analyzes categorical data, the chi-square test is the more general and commonly preferred test for larger sample sizes, which is generally assumed unless otherwise specified. *Mann-Whitney U test* - The **Mann-Whitney U test** is a non-parametric test used to compare differences between two independent groups when the dependent variable is **ordinal or continuous** but not normally distributed. - This test is not appropriate for analyzing the association between two categorical variables, as it requires at least one variable to have ranked or continuous data.

Question 4: A statistician wants to study the effects of a medicine in three groups-humans, animals, and plants. He then selects randomly from these three groups. Which type of sampling is being performed?

A. Simple random sampling
B. Systematic sampling
C. Stratified random sampling (Correct Answer)
D. Cluster sampling
E. Convenience sampling

Explanation: ***Stratified random sampling*** - This method involves dividing the population into **distinct subgroups (strata)** based on shared characteristics (in this case, humans, animals, and plants), and then performing a simple random sample within each stratum. - This ensures that all subgroups are proportionally represented in the sample, which is appropriate when studying effects across different biological categories. *Simple random sampling* - This method involves selecting individuals from the entire population **purely by chance**, without first dividing them into subgroups. - It would not guarantee representation from all three distinct groups (humans, animals, and plants), which is essential for studying differential effects. *Systematic sampling* - This involves selecting samples at **regular intervals** from an ordered list or sequence. - This method is not suitable here because the population is divided into distinct, non-ordered groups rather than a continuous sequence. *Cluster sampling* - This method involves dividing the population into **clusters**, then randomly selecting some clusters and sampling all individuals within those selected clusters. - In this scenario, the initial groups (humans, animals, plants) are strata, not clusters, as the intent is to sample from within each group, not to treat the groups themselves as primary sampling units. *Convenience sampling* - This is a **non-probability sampling method** where subjects are selected based on ease of access rather than random selection. - The question explicitly states that random selection is performed from each group, ruling out convenience sampling.