Biostatistics Practice Questions

Q: The figure below represents a statistical distribution. What type of skewness does it exhibit?

Right skewed. ***Right skewed*** - A distribution is **right-skewed** (positively skewed) when its tail extends longer to the right side, indicating that the majority of the data points are concentrated on the left side of the distribution. - In a right-skewed distribution, the **mean** is typically greater than the **median**, which is also greater than the **mode** (Mean > Median > Mode). *Normal distribution* - A **normal distribution** is **symmetrical**, meaning its left and right sides are mirror images, with no skewness. - In a normal distribution, the **mean, median, and mode** are approximately equal and located at the center of the distribution. *Left skewed* - A **left-skewed** (or negatively skewed) distribution has a tail that extends longer to the left side, with a concentration of data points on the right. - For a left-skewed distribution, the **mean** is usually less than the **median**, which is less than the **mode** (Mean < Median < Mode). *Bimodal distribution* - A **bimodal distribution** has two distinct peaks or modes, indicating two different groups or subpopulations in the data. - This is different from skewness, which describes the asymmetry of a single-peaked distribution.

Q: In a standard normal distribution curve, what percentage of the area under the curve lies between the mean and one standard deviation from the mean?

34%. ***34%*** - In a **standard normal distribution**, approximately 34.1% of the data falls between the **mean** and one **standard deviation** above the mean, and similarly, 34.1% falls between the mean and one standard deviation below the mean. - This is a fundamental property derived from the **empirical rule (68-95-99.7 rule)**, where 68% of the data lies within one standard deviation of the mean (34% on each side). *15%* - This percentage is too low and does not align with the properties of a **standard normal distribution** regarding the area between the mean and one standard deviation. - While 15.85% of data falls *beyond* one standard deviation above or below the mean, it's not the area *between* the mean and one standard deviation. *68%* - This value represents the total area under the curve that lies within **one standard deviation** *of the mean* (i.e., from -1 SD to +1 SD from the mean). - It is the sum of the areas between the mean and +1 SD, and between the mean and -1 SD, which is 34% + 34% = 68%. The question specifically asks for the area between the mean and *one* standard deviation (i.e., on one side). *95%* - This value represents the total area under the curve that lies within **two standard deviations** *of the mean* (i.e., from -2 SD to +2 SD from the mean). - According to the **empirical rule**, approximately 95% of data falls within two standard deviations of the mean.

Q: Case-control study is an example of?

Retrospective study. ***Retrospective study*** - In a **case-control study**, researchers look back in time to identify past exposures that may have led to a disease or outcome. - They start with an outcome (cases) and then investigate their past exposures, comparing them to a control group free of the outcome. *Prospective study* - A **prospective study** follows participants forward in time to observe the development of an outcome after an exposure. - Examples include cohort studies, where groups are followed over time to see who develops a disease. *Combined retrospective and prospective study* - This option refers to study designs that incorporate elements of both backward and forward-looking data collection. - While some complex study designs can have both components, a pure case-control study is primarily retrospective. *Study at one point of time* - This describes a **cross-sectional study**, which measures exposure and outcome simultaneously at a single point in time. - Case-control studies, by contrast, involve looking back in time to assess past exposures relative to a current outcome.

Q: Which type of chart is best to represent the following data? Year: 1991, 1992, 1993, 1994; Number of LBW babies: 125, 50, 25, 75.

Bar chart. **Bar chart** - A **bar chart** is the most appropriate for representing categorical data or discrete numerical data over a period. - Each year (1991, 1992, 1993, 1994) represents a distinct category, and the number of LBW babies is the quantitative value associated with each year. *Histogram* - A **histogram** is used to represent the distribution of continuous numerical data, grouped into bins, to show frequencies. - The data provided (years and counts) is discrete, not continuous. *Frequency polygon* - A **frequency polygon** is used to display the shape of distribution for a continuous variable, often by connecting the midpoints of the tops of the bars in a histogram. - It is not suitable for discrete yearly data, as there are no continuous intervals to connect. *Scatter diagram* - A **scatter diagram** is used to show the relationship or correlation between two continuous numerical variables. - While one variable is numerical (number of LBW babies), the other (year) is categorical or ordinal, and the primary purpose here is to show change over time, not a correlation between two continuous variables.

Q: What is the numerator in the formula for calculating Negative Predictive Value (NPV) in diagnostic testing?

True negative. ***True negative*** - In the calculation of **Negative Predictive Value (NPV)**, the numerator represents the number of individuals who are truly disease-free and also test negative for the disease. - NPV answers the question: "If a patient tests negative, what is the probability that they are actually **disease-free**?" *True positive* - **True positives** are individuals who have the disease and also test positive; they are the numerator for **Positive Predictive Value (PPV)**. - They do not factor into the numerator for NPV, which focuses on negative test results and the absence of disease. *False positive* - **False positives** are individuals who do not have the disease but test positive; they are found in the denominator for PPV, but not in the numerator for NPV. - They represent an incorrect test result and do not contribute to the count of truly healthy individuals with a negative test. *False negative* - **False negatives** are individuals who have the disease but test negative; they are in the denominator for **sensitivity** and NPV. - They represent a missed diagnosis and are not part of the numerator for NPV, which specifically identifies correctly identified healthy individuals.

Q: Which of the following is an example of a case-control study that investigates the relationship between a risk factor and a disease?

All of the options. ***All of the options*** - **All three scenarios** represent classic examples of case-control studies in epidemiology, where investigators identified cases of disease and compared them to controls to determine past exposure to risk factors. - Case-control studies are **retrospective** in design, starting with the outcome (disease) and looking backward to identify exposure history. **Maternal smoking and congenital malformation** - Cases: Children with congenital malformations - Controls: Children without malformations - Exposure assessed: History of maternal smoking during pregnancy - This exemplifies the typical case-control approach to studying teratogenic exposures. **Thalidomide exposure and teratogenicity** - The landmark studies by **Lenz (1961)** and **McBride (1961)** were **case-control studies** - Cases: Infants with phocomelia (limb malformations) - Controls: Infants without malformations - They looked backward from the cases to identify thalidomide exposure during pregnancy - This rapid identification of the thalidomide-phocomelia link demonstrates the power of case-control methodology for rare outcomes. **Vaginal adenocarcinoma and intrauterine exposure to DES** - The classic **Herbst et al. (1971)** study was a **case-control study** - Cases: Young women with clear cell adenocarcinoma of the vagina - Controls: Age-matched women without the disease - They investigated past exposure and discovered the association with maternal DES use during pregnancy - This is a textbook example of case-control design for investigating rare diseases with long latency periods.

Q: In a study assessing malnutrition among young children, 100 children were selected from rural and urban areas (50 from each area). Out of these, 30 children from rural areas and 20 children from urban areas were found to be malnourished. Which statistical test is appropriate for comparing the proportions of malnourished children between the two groups?

Chi-square. ***Chi-square*** - The **chi-square test** is used to compare proportions or frequencies between two or more categorical groups. Here, we are comparing the proportion of malnourished children (a categorical outcome) between two different living areas (rural vs. urban, also categorical). - This test determines if there is a statistically significant association between the two categorical variables. *Paired t-test* - A **paired t-test** is used to compare the means of two related groups or samples, such as measurements taken before and after an intervention on the same individuals. - This scenario involves comparing independent groups (rural vs. urban children) and proportions, not means from paired samples. *The standard error of mean* - The **standard error of the mean (SEM)** is a measure of the statistical accuracy of an estimate; specifically, it's the standard deviation of the sample mean's distribution. - It is used to quantify the variability of sample means, not to perform a comparative hypothesis test between two groups. *ANOVA* - **ANOVA (Analysis of Variance)** is used to compare the means of **three or more independent groups**. While it compares means, it is not appropriate for comparing proportions between just two groups. - If we were comparing the mean weight of children across three or more living areas, ANOVA would be suitable, but not for comparing proportions between two groups.

Q: In a study of 200 patients, CA-125 testing was performed. Among the 100 patients who tested positive, 60 had ovarian cancer confirmed by histopathology. Among the 100 patients who tested negative, 20 had ovarian cancer confirmed by histopathology. What is the negative predictive value of this test?

80/100. ***80/100*** - The **negative predictive value (NPV)** is the probability that a patient who tests negative actually does not have the disease. - In this case, 100 patients tested negative, and 20 of them *did* have ovarian cancer, meaning 80 **did not** have ovarian cancer. Thus, NPV = 80/100. *20/100* - This represents the number of **false negatives** among all patients who tested negative, not the negative predictive value. - A false negative occurs when the test result is negative, but the disease is actually present. *40/100* - This value represents the number of patients who tested positive but **did not** have the disease (false positives), calculated as 100 (total positive tests) - 60 (true positives) = 40. - This is not the calculation for negative predictive value. *60/100* - This represents the number of **true positives** among all patients who tested positive. - This is a component of **positive predictive value**, not negative predictive value.

Q: In a normal distribution with mean = 200 and standard deviation = 20, what is the range in which 68% of the values will fall?

180-220. ***180-220*** - In a **normal distribution**, approximately 68% of the data falls within **one standard deviation** of the mean. - With a mean of 200 and a standard deviation of 20, this range is calculated as 200 ± 20, which equals **180-220**. *160-240* - This range represents the values falling within **two standard deviations** from the mean (200 ± 2*20 = 160-240). - Approximately **95%** of the values in a normal distribution fall within this range, not 68%. *170-230* - This range does not correspond to a standard integer multiple of the standard deviation from the mean (200 ± 1.5*20 = 170-230). - It does not represent a standard percentage of values in a normal distribution like 68%, 95%, or 99.7%. *190-210* - This range represents half of one standard deviation from the mean (200 ± 0.5*20 = 190-210). - This range covers a smaller percentage of values than 68%, typically around **38%**.

Q: Which type of study is used to determine the cross product ratio?

Case control. ***Case control*** - **Case-control studies** are specifically designed to compare exposure histories between individuals with a disease (cases) and those without (controls), which directly facilitates the calculation of the **odds ratio**. - The odds ratio is called the **cross-product ratio** because of its calculation method: (a×d)/(b×c), where the products are "crossed" in the 2×2 contingency table. - This is the **primary measure of association** in case-control studies and serves as an approximation of the relative risk, particularly for rare outcomes. *Cohort* - **Cohort studies** follow exposed and unexposed groups over time to determine the incidence of disease, allowing for the direct calculation of **relative risk** and **attributable risk**. - While odds ratios can be calculated from cohort data, the **relative risk** is the primary and preferred measure of association in cohort studies, not the cross-product ratio. *Cross sectional* - **Cross-sectional studies** assess the prevalence of disease and exposure at a single point in time, providing a snapshot of the population's health status. - They measure **prevalence** rather than incidence and can calculate prevalence ratios, but the term "cross-product ratio" specifically refers to the odds ratio from **case-control** study designs. *RCT* - **Randomized controlled trials (RCTs)** are experimental studies where participants are randomly assigned to intervention or control groups to evaluate treatment efficacy. - They primarily focus on determining the **relative risk** or **risk ratio** of an outcome following an intervention and are not designed for calculating the cross-product ratio (odds ratio) as used in observational case-control studies.

Question 1

The figure below represents a statistical distribution. What type of skewness does it exhibit?

Accepted Answer

Right skewed

Answer

Normal distribution

Answer

Left skewed

Answer

Bimodal distribution

Question 2

In a standard normal distribution curve, what percentage of the area under the curve lies between the mean and one standard deviation from the mean?

Accepted Answer

34%

Answer

68%

Answer

15%

Answer

95%

Question 3

Case-control study is an example of?

Accepted Answer

Retrospective study

Answer

Prospective study

Answer

Combined retrospective and prospective study

Answer

Study at one point of time

Question 4

Which type of chart is best to represent the following data? Year: 1991, 1992, 1993, 1994; Number of LBW babies: 125, 50, 25, 75.

Accepted Answer

Bar chart

Answer

Histogram

Answer

Scatter diagram

Answer

Frequency polygon

Question 5

What is the numerator in the formula for calculating Negative Predictive Value (NPV) in diagnostic testing?

Accepted Answer

True negative

Answer

True positive

Answer

False positive

Answer

False negative

Question 6

Which of the following is an example of a case-control study that investigates the relationship between a risk factor and a disease?

Accepted Answer

All of the options

Answer

Maternal smoking and congenital malformation

Answer

Thalidomide exposure and teratogenicity

Answer

Vaginal adenocarcinoma and intrauterine exposure to DES

Question 7

In a study assessing malnutrition among young children, 100 children were selected from rural and urban areas (50 from each area). Out of these, 30 children from rural areas and 20 children from urban areas were found to be malnourished. Which statistical test is appropriate for comparing the proportions of malnourished children between the two groups?

Accepted Answer

Chi-square

Answer

Paired t-test

Answer

The standard error of Mean

Answer

ANOVA

Question 8

In a study of 200 patients, CA-125 testing was performed. Among the 100 patients who tested positive, 60 had ovarian cancer confirmed by histopathology. Among the 100 patients who tested negative, 20 had ovarian cancer confirmed by histopathology. What is the negative predictive value of this test?

Accepted Answer

80/100

Answer

20/100

Answer

40/100

Answer

60/100

Question 9

In a normal distribution with mean = 200 and standard deviation = 20, what is the range in which 68% of the values will fall?

Accepted Answer

180-220

Answer

160-240

Answer

170-230

Answer

190-210

Question 10

Which type of study is used to determine the cross product ratio?

Accepted Answer

Case control

Answer

Cohort

Answer

Cross sectional

Answer

RCT

Biostatistics — MCQs

Biostatistics — MCQs

On this page

Practice by Chapter

Want unlimited practice?