Two laboratories have developed testing for Zika virus. The statistical method shown below was used to compare the diagnostic performance of these tests. What is this statistical method called?
Q732
A study was conducted to find out the number of positive lymph nodes in a population of breast cancer patients who underwent axillary dissection. A graph as shown below was plotted between the number and frequency of positive nodes. Which of the following is correct?
Q733
Which of the following vials can be used? (AIIMS Nov 2017)
Q734
A new test in red line has been designed to diagnose a disease condition. The test is applied to both normal and diseased population, the graph of which is given below. Which of the following is correct regarding the test?
Q735
Sample size of samples A, B and C are 500, 800 and 1000 respectively. Which sample has the highest margin of error? (NEET Jan 2018)
Q736
Which of the following is true about the box plot shown?
Q737
Which is incorrect about the image shown below?
Q738
A study was conducted to evaluate the diagnostic accuracy of ECG for detecting myocardial infarction. The results are shown in the 2x2 table below:
Myocardial Infarction
Present Absent Total
ECG Positive 45 8,000 8,045
ECG Negative 5 32,000 32,005
Total 50 40,000 40,050
Consider the following statements:
I. Sensitivity is 90%.
II. Specificity is 80%.
Which of the statements given above is/are correct?
Q739
The Sample Registration System (SRS), an important source of health information consists of continuous enumeration of births and deaths by an enumerator and an independent survey every six months by an investigator-supervisor. Which one of the following terms best describes this system ?
Q740
Consider the following statements about correlation between two variables :
1. The correlation is done between an independent variable X and a dependent variable Y.
2. The coefficient of correlation can range from -1 to ∞.
3. If coefficient of correlation (r) is equal to 1, it indicates there is no association between X and Y.
4. Correlation does not necessarily prove causation. Which of the statements given above are correct ?
Biostatistics Indian Medical PG Practice Questions and MCQs
Question 731: Two laboratories have developed testing for Zika virus. The statistical method shown below was used to compare the diagnostic performance of these tests. What is this statistical method called?
A. ROC curve (Correct Answer)
B. Lorenz curve
C. Bell curve
D. Gaussian curve
Explanation: ***ROC curve***
- This graph, displaying **true positive rate (sensitivity)** against the **false positive rate (1-specificity)**, is characteristic of a **Receiver Operating Characteristic (ROC) curve**.
- ROC curves are commonly used to evaluate and compare the performance of **diagnostic tests** or predictive models across various threshold settings.
*Lorenz curve*
- A Lorenz curve is used in economics to represent **income inequality** or wealth distribution.
- It plots the proportion of total income/wealth owned by the bottom x% of the population, and does not relate to diagnostic test performance.
*Bell curve*
- A bell curve, or **normal distribution curve**, is a symmetrical graph that shows the distribution of a **continuous variable**.
- It describes how data points are distributed around the mean, not the performance of diagnostic tests.
*Gaussian curve*
- A Gaussian curve is another name for a **normal distribution** or **bell curve**.
- It is used to model random variables in statistics, not to compare the sensitivity and specificity of diagnostic tests in the manner shown.
Question 732: A study was conducted to find out the number of positive lymph nodes in a population of breast cancer patients who underwent axillary dissection. A graph as shown below was plotted between the number and frequency of positive nodes. Which of the following is correct?
A. A: Mean, B: Median, C: Mode
B. A: Median, B: Mean, C: Mode
C. A: Mode, B: Median, C: Mean (Correct Answer)
D. A: Mean, B: Mode, C: Median
Explanation: ***A: Mode, B: Median, C: Mean***
- In a **positively skewed distribution**, the **mode** is the value that appears most frequently (highest peak), which is A.
- The **median** is the middle value when data are ordered, and in a positively skewed distribution, it falls between the mode and the mean (point B).
- The **mean** is the average of all values, and it is pulled towards the tail of the skew, making it the highest value among the three in a positively skewed distribution (point C).
*A: Mean, B: Median, C: Mode*
- This order would be correct for a **negatively skewed distribution**, where the tail extends to the left, and the mean is the smallest.
- However, the given graph clearly shows a **positive skew**, with the peak at the beginning and the tail extending to the right.
*A: Median, B: Mean, C: Mode*
- This arrangement does not correspond to a standard distribution pattern, whether skewed positively or negatively.
- The median, mean, and mode have established relative positions depending on the **skewness** of the data.
*A: Mean, B: Mode, C: Median*
- This option incorrectly places the **mode** in the middle and the **mean** at the beginning of the distribution.
- This order is inconsistent with the characteristics of any common type of data distribution.
Question 733: Which of the following vials can be used? (AIIMS Nov 2017)
A. 1,2 can be used (Correct Answer)
B. 3,4 can be used
C. 1,2,3 can be used
D. Only 1 can be used
Explanation: ***1,2 can be used***
- Vials 1 and 2 show the **inner square lighter than or equal in color to the outer circle** on the VVM, indicating the vaccine has been stored correctly and is safe to use.
- The **Vaccine Vial Monitor (VVM)** is a time-temperature indicator that changes color irreversibly when exposed to excessive heat.
- These vials are at **VVM Stage 1 or 2** (usable stages), confirming they have not been exposed to heat that would degrade vaccine potency.
*3,4 can be used*
- This is **INCORRECT** because both vials 3 and 4 show VVM indicators that have reached **discard point**.
- Vial 3's VVM shows the **inner square darker than the outer circle**, indicating heat exposure (VVM Stage 3 or beyond).
- Vial 4's VVM shows an even **darker inner square, nearly merged with the outer circle**, signifying severe heat damage.
- **WHO/EPI guidelines** mandate discarding vaccines when VVM inner square becomes darker than outer circle.
*1,2,3 can be used*
- This is **INCORRECT** because vial 3 has crossed the **VVM discard point**.
- The inner square in vial 3 is **darker than the outer circle**, indicating heat exposure that compromises vaccine efficacy.
- Using heat-damaged vaccines leads to **immunization failure** and false sense of protection.
*Only 1 can be used*
- This is **INCORRECT** because vial 2 also shows a **safe VVM status** (inner square lighter than outer circle).
- Both vials 1 and 2 are at usable VVM stages and can be administered safely.
- There is **no visual difference** between the VVM status of vials 1 and 2 that would justify discarding vial 2.
Question 734: A new test in red line has been designed to diagnose a disease condition. The test is applied to both normal and diseased population, the graph of which is given below. Which of the following is correct regarding the test?
A. High sensitivity and high specificity
B. High sensitivity and low specificity
C. Low sensitivity and low specificity (Correct Answer)
D. Low sensitivity and high specificity
Explanation: ***Low sensitivity and low specificity***
- In the provided graph, the **red line** curve (new test) for the healthy and diseased populations shows substantial **overlap**, meaning there is poor discrimination between the two groups.
- A test with **low sensitivity** will miss many true positive cases (diseased individuals), and a test with **low specificity** will incorrectly identify many healthy individuals as diseased, both of which are indicated by the extensive overlap.
*High sensitivity and high specificity*
- This would be represented by two curves that are **well-separated**, with minimal overlap, allowing for clear distinction between healthy and diseased individuals.
- Such a test would correctly identify most diseased individuals (**high sensitivity**) and most healthy individuals (**high specificity**).
*High sensitivity and low specificity*
- This would typically show a test that correctly identifies most diseased individuals (high true positive rate), but also incorrectly flags many healthy individuals as diseased (high false positive rate).
- Graphically, this might appear as the diseased curve being mostly captured, but with significant spillover into the healthy range.
*Low sensitivity and high specificity*
- This scenario suggests a test that rarely misidentifies healthy individuals as diseased (low false positive rate), but also misses many diseased individuals (high false negative rate).
- The healthy curve would be well-defined and distinct, but the diseased curve would significantly overlap with the healthy curve, indicating poor detection of disease.
Question 735: Sample size of samples A, B and C are 500, 800 and 1000 respectively. Which sample has the highest margin of error? (NEET Jan 2018)
A. Sample A (Correct Answer)
B. Sample B
C. Sample C
D. None of above
Explanation: ***Sample A***
- The **margin of error** is inversely proportional to the square root of the sample size. Therefore, a smaller sample size leads to a larger margin of error.
- Sample A has the smallest sample size (N=500) among the given options, thus having the **highest margin of error**.
*Sample B*
- With a sample size of 800, Sample B has a **smaller margin of error** than Sample A but a larger margin of error than Sample C.
- As the sample size increases, the precision of the estimate improves, and the margin of error decreases.
*Sample C*
- Sample C has the largest sample size (N=1000), which results in the **smallest margin of error** among all samples.
- A larger sample size generally provides a more accurate representation of the population.
*None of above*
- This option is incorrect because the sample size directly influences the margin of error, and Sample A clearly has the smallest size.
- Based on statistical principles, one of the samples must inherently have the highest margin of error.
Question 736: Which of the following is true about the box plot shown?
A. Mean = median = mode
B. Mean = median, not equal to mode (Correct Answer)
C. Mean = mode, not equal to median
D. Mean, median and mode are not equal
Explanation: ***Mean = median, not equal to mode***
- A **perfectly symmetrical distribution** (represented by a **symmetrical box plot without whiskers extending further in one direction**) indicates that the **mean and median are equal**.
- However, the mode, which is the most frequent value, is not necessarily equal to the mean and median in all symmetrical distributions, especially if it's **not unimodal or not centered at the mean/median**. When depicted using a boxplot, we cannot ascertain the mode simply by looking at where the median lies.
*Mean = median = mode*
- While the **mean and median are equal** in a symmetrical distribution, the **mode is not explicitly represented** in a box plot.
- The mode is the most frequently occurring value, and a box plot primarily shows quartiles and spread, not individual frequencies.
*Mean = mode, not equal to median*
- In a symmetrical distribution, the **mean and median are equal**.
- Therefore, this option is incorrect as it states the mean is not equal to the median.
*Mean, median and mode are not equal*
- The **symmetrical nature** of the box plot strongly suggests that the **mean and median are equal**.
- This option is therefore incorrect, as at least two of the measures of central tendency are equal.
Question 737: Which is incorrect about the image shown below?
A. Negatively skewed
B. Positively skewed
C. 75 % values are above 25 mg (Correct Answer)
D. Median is 50 mg
Explanation: ***75 % values are above 25 mg***
- This statement is incorrect. In a box plot, the **second quartile (Q2)** or **median** represents the 50th percentile. The upper boundary of the lower box (Q2) is at 23 mg, meaning 50% of values are above 23 mg.
- The upper boundary of the upper box (Q3) is at 35 mg, meaning 25% of values are above 35 mg. Therefore, it is incorrect to say 75% of values are above 25 mg.
*Negatively skewed*
- The long **tail of the distribution** is on the left side, as indicated by the lower whisker extending further from the box than the upper whisker, and the lower half of the box being larger than the upper half.
- In a negatively skewed distribution, the **mean is typically less than the median**, and the bulk of the values are concentrated on the higher end.
*Positively skewed*
- This statement is incorrect. A **positively skewed** distribution would have a longer tail on the right side, meaning the upper whisker would be longer than the lower whisker and the upper box larger than the lower box.
- The provided image shows the opposite, with the longer tail towards the lower values.
*Median is 50 mg*
- The **median** is represented by the line dividing the lower and upper halves of the box. In this box plot, the median line is at approximately **23 mg**, not 50 mg.
- The box itself represents the **interquartile range (IQR)**, with the median dividing it.
Question 738: A study was conducted to evaluate the diagnostic accuracy of ECG for detecting myocardial infarction. The results are shown in the 2x2 table below:
Myocardial Infarction
Present Absent Total
ECG Positive 45 8,000 8,045
ECG Negative 5 32,000 32,005
Total 50 40,000 40,050
Consider the following statements:
I. Sensitivity is 90%.
II. Specificity is 80%.
Which of the statements given above is/are correct?
A. Both I and II (Correct Answer)
B. Neither I nor II
C. II only
D. I only
Explanation: **Both I and II**
- **Sensitivity** = True Positives / (True Positives + False Negatives) = 45 / (45 + 5) = 45/50 = **0.90 or 90%** ✓
- **Specificity** = True Negatives / (True Negatives + False Positives) = 32,000 / (32,000 + 8,000) = 32,000/40,000 = **0.80 or 80%** ✓
- Both calculations are correct based on the 2×2 contingency table
- **Sensitivity** measures the ability of the test to correctly identify those with disease (true positive rate)
- **Specificity** measures the ability of the test to correctly identify those without disease (true negative rate)
*I only*
- Incorrect because Statement II (Specificity = 80%) is also correct, not just Statement I
*II only*
- Incorrect because Statement I (Sensitivity = 90%) is also correct, not just Statement II
*Neither I nor II*
- Incorrect because both statements are mathematically correct based on the given data
Question 739: The Sample Registration System (SRS), an important source of health information consists of continuous enumeration of births and deaths by an enumerator and an independent survey every six months by an investigator-supervisor. Which one of the following terms best describes this system ?
A. Triple-record system
B. Double blinding
C. Double data entry
D. Dual-record system (Correct Answer)
Explanation: ***Dual-record system***
- This system involves two independent sources of data collection, such as a continuous enumeration and a periodic survey, to estimate vital events with greater accuracy by **cross-checking and matching the records**
- The independent enumeration by an enumerator and a separate survey by an investigator-supervisor perfectly aligns with the principles of a **dual-record system**, designed to improve data quality and completeness
- The SRS uses this methodology to capture births and deaths that might be missed by a single source
*Triple-record system*
- This system would involve **three independent sources** of data collection, which is more complex and not described in the given scenario
- While potentially offering even higher accuracy, it's not applicable here as only two sources are mentioned
*Double blinding*
- **Double blinding** is a technique used in clinical trials where neither the participants nor the researchers know who is receiving a particular treatment
- This method is used to **prevent bias** in clinical studies and is completely unrelated to vital statistics data collection methodology
*Double data entry*
- **Double data entry** is a process where data is entered twice by two different operators and then compared to **identify and correct errors**
- This technique focuses on improving the accuracy of data input for a single data source, not on combining two independent sources of information for surveillance
Question 740: Consider the following statements about correlation between two variables :
1. The correlation is done between an independent variable X and a dependent variable Y.
2. The coefficient of correlation can range from -1 to ∞.
3. If coefficient of correlation (r) is equal to 1, it indicates there is no association between X and Y.
4. Correlation does not necessarily prove causation. Which of the statements given above are correct ?
A. 1 only
B. 4 only (Correct Answer)
C. 1, 2 and 3
D. 1 and 4 only
Explanation: ***Correct: 4 only***
- **Correlation** measures the strength and direction of a linear relationship between two variables, but it **does not imply that one causes the other**; other factors or confounding variables might be involved.
- This statement is a fundamental principle in statistics, emphasizing that causality requires more rigorous evidence, such as controlled experiments, beyond a simple correlation.
- **Only statement 4 is correct** among all the given statements.
*Incorrect: 1 only*
- While correlation is often explored between dependent and independent variables, it can also be used to assess the relationship between **any two quantitative variables**, whether one is clearly designated as independent or dependent.
- Statement 1 is partially incorrect as correlation isn't exclusively between designated independent and dependent variables.
*Incorrect: 1, 2 and 3*
- Statement 1 is partially incorrect as correlation isn't exclusively between designated independent and dependent variables.
- Statement 2 is incorrect because the **coefficient of correlation (r) ranges from -1 to +1**, not to infinity, with -1 indicating a perfect negative correlation and +1 a perfect positive correlation.
- Statement 3 is incorrect because an **r equal to 1 indicates a perfect positive linear association** between X and Y, meaning they move in the same direction proportionally, not no association.
*Incorrect: 1 and 4 only*
- Statement 1 is incorrect because **correlation can be performed between any two variables** to assess their relationship, not just an explicitly independent and dependent pair.
- While statement 4 is correct, the inclusion of statement 1 makes this option incorrect.