A randomized trial comparing the efficacy of two drugs showed a difference between the two (p value < 0.05). However, in reality the drugs do not differ. This is an example of
In a village, every fifth house was selected for a study. This is an example of
The ability of a test to identify correctly those who do not have the disease is called its
In a normal curve, how much per cent of the values will be included in the area between two standard deviations on either side of the mean (X ± 2σ) ?
"Sampling error" occurs due to the variation in results
Which is the fertility indicator that gives the approximate magnitude of completed family size?
In a normal distribution, Mean ± 2 S.D. contains
The appropriate statistical test to find out obesity as a significant risk factor for breast cancer is:
Which of the following is/are the measure(s) of dispersion? 1. Mode 2. Median 3. Standard Deviation Select the correct answer using the code given below:
Which one of the following is not a measure of dispersion?
Explanation: ***Type I error*** - A **Type I error** occurs when the **null hypothesis is incorrectly rejected**, leading to the conclusion that a significant difference exists when, in reality, there is no true difference. - In this scenario, the trial concluded a difference (p < 0.05), but the drugs are truly equivalent, which is precisely the definition of a **Type I error**. *Both type I and II error* - It is impossible to commit both a **Type I** and a **Type II error** simultaneously for the same statistical test. - A **Type I error** involves rejecting a true null hypothesis, while a **Type II error** involves failing to reject a false null hypothesis. *Random error* - **Random error** refers to unpredictable fluctuations in measurements or results, which can be minimized but not eliminated. - While random error can contribute to variability in data, it is not the direct statistical error of concluding a non-existent difference when analyzing the results, which is a **Type I error**. *Type II error* - A **Type II error** occurs when the **null hypothesis is incorrectly accepted** (or not rejected), meaning a real difference exists but the study fails to detect it. - This scenario describes the opposite: a difference was detected and concluded, but it was false.
Explanation: ***Systematic random sampling*** - This method involves selecting subjects from a **ordered sampling frame** at regular intervals, such as every k-th item. - In this scenario, selecting every fifth house represents a fixed interval (k=5), which is characteristic of systematic random sampling. *Simple random sampling* - This method ensures that every member of the population has an **equal chance of being selected**, often through random number generation. - It does not involve a predetermined, fixed interval of selection from an ordered list. *Convenience sampling* - This technique involves selecting subjects who are **easily accessible or readily available**, without any systematic or random process. - It is prone to bias as it does not represent the entire population. *Stratified random sampling* - This method involves dividing the population into **homogeneous subgroups (strata)** and then conducting simple random sampling within each stratum. - The scenario does not describe dividing the village households into distinct subgroups before selection.
Explanation: ***Specificity*** - **Specificity** is the proportion of **true negatives** correctly identified by the test. - It measures the ability of a test to correctly identify individuals who **do not have the disease**. *Sensitivity* - **Sensitivity** is the proportion of **true positives** correctly identified by the test. - It measures the ability of a test to correctly identify individuals who **do have the disease**. *Positive predictive value* - **Positive predictive value (PPV)** is the probability that a patient with a **positive test result** actually has the disease. - It depends on the **prevalence** of the disease in the population being tested. *Negative predictive value* - **Negative predictive value (NPV)** is the probability that a patient with a **negative test result** actually does not have the disease. - It also depends on the **prevalence** of the disease in the population.
Explanation: ***Correct: 95.4*** - According to the **empirical rule** (also known as the 68-95-99.7 rule), approximately 95% of data falls within two standard deviations of the mean in a normal distribution. - More precisely, the area between **X ± 2σ** encompasses **95.4%** of the values. - This is a fundamental concept in biostatistics used for calculating confidence intervals and reference ranges. *Incorrect: 68.3* - This percentage represents the proportion of data within **one standard deviation** (X ± 1σ) of the mean in a normal distribution. - It is not the correct value for the range of two standard deviations. *Incorrect: 90.4* - This value does not correspond to any standard interval of standard deviations around the mean in a normal distribution. - It is not part of the empirical rule for common standard deviation ranges. *Incorrect: 99.7* - This percentage represents the proportion of data within **three standard deviations** (X ± 3σ) of the mean in a normal distribution. - It is a larger interval than what is asked in the question (two standard deviations).
Explanation: ***between one sample and another*** - **Sampling error** arises because a sample is not a perfect representation of the entire population from which it is drawn. - This error quantifies the natural **variability** that occurs when different subgroups (samples) are selected from the same population. *due to the use of many instruments in the study* - This scenario describes **inter-instrument variability** or **measurement error**, which is related to the precision and calibration of different tools. - While it can introduce error, it is distinct from sampling error, which arises from the representativeness of the chosen study subjects. *due to the multiple readings taken on the same instrument* - Multiple readings on the same instrument assess **intra-instrument variability** or **repeatability**, indicating how consistent a single instrument is over time. - This relates to the precision of the measurement device, not the representativeness of the sample itself. *between the observations of two individuals* - Differences in observations between two individuals indicate **inter-rater variability** or **observer bias**. - This type of error is related to subjective interpretation or measurement technique by different observers, rather than the intrinsic variability between selected samples.
Explanation: ***Total Fertility Rate*** - The **Total Fertility Rate (TFR)** estimates the average number of children a woman would have over her lifetime if she were to experience current age-specific fertility rates. - It provides a measure of the **completed family size** in a hypothetical cohort of women. *Age-specific Fertility Rate* - The **Age-specific Fertility Rate (ASFR)** measures the number of births to women in a particular age group per 1,000 women in that age group. - It does not directly provide the completed family size but is a component used to calculate the TFR. *General Fertility Rate* - The **General Fertility Rate (GFR)** calculates the number of live births per 1,000 women of childbearing age (typically 15-49 years) in a given year. - While it reflects overall fertility, it does not provide an estimate of the completed family size per woman. *Gross Reproduction Rate* - The **Gross Reproduction Rate (GRR)** is similar to the TFR but only considers female births. - It estimates the average number of daughters a woman would have over her lifetime based on current age-specific fertility rates, without accounting for mortality.
Explanation: ***95.4 % values*** - According to the **empirical rule** (or 68-95-99.7 rule) for normal distributions, approximately **95.4%** of data falls within two standard deviations of the mean. - This interval covers from (Mean - 2 S.D.) to (Mean + 2 S.D.) and represents the likelihood of a value falling in this range. *68.3 % values* - This percentage corresponds to the data contained within **Mean ± 1 S.D.** in a normal distribution, not Mean ± 2 S.D. - It signifies that roughly two-thirds of all observations lie within one standard deviation from the mean in a bell-shaped curve. *91.2 % values* - This value is not a standard percentage associated with common multiples of standard deviations (1, 2, or 3) from the mean in a normal distribution. - It does not correspond to any universally recognized interval like ±1 S.D., ±2 S.D., or ±3 S.D. *99.7 % values* - This percentage represents the data contained within **Mean ± 3 S.D.** in a normal distribution. - It indicates that almost all (99.7%) of the data points are expected to fall within three standard deviations from the mean.
Explanation: ***Chi-square test*** - The **chi-square test** is used to determine if there is a **significant association** between two **categorical variables**. - In this scenario, both obesity (yes/no) and breast cancer (yes/no) are categorical, making chi-square appropriate to assess if obesity is a risk factor. *Wilcoxon’s signed rank test* - This is a **non-parametric test** used for comparing two related samples or repeated measurements on a single sample, especially when data are not normally distributed. - It is not suitable for assessing the association between two independent categorical variables like obesity and breast cancer. *Student’s paired ‘t’ test* - The **paired t-test** is used to compare the means of two related groups or measurements from the same subject under different conditions (e.g., before and after an intervention). - This test is designed for **continuous data** and would not be appropriate for the categorical variables of obesity and breast cancer. *Student’s unpaired ‘t’ test* - The **unpaired t-test** (also known as independent samples t-test) is used to compare the means of two independent groups for a **continuous outcome variable**. - It is not suitable when both the exposure (obesity) and the outcome (breast cancer) are categorical variables.
Explanation: ***Correct: 3 only*** - **Standard Deviation** is a direct measure of dispersion that quantifies the amount of variation or spread of data values around the mean - It indicates how much individual data points deviate from the average, making it a key statistic for understanding the **spread** within a dataset - Other common measures of dispersion include **range, variance, interquartile range, and coefficient of variation** *Incorrect: 1, 2 and 3* - **Mode** and **Median** are measures of **central tendency**, not dispersion - They describe the center or typical value of a dataset, not the spread or variability - While they provide insight into the data's distribution, they do not quantify how spread out the data points are *Incorrect: 2 and 3 only* - **Median** is a measure of **central tendency** representing the middle value when data is ordered, not a measure of dispersion - Only **Standard Deviation** from this option is a measure of dispersion, making this choice incorrect *Incorrect: 1 and 2 only* - Both **Mode** and **Median** are measures of **central tendency** - Mode indicates the most frequent value and Median represents the middle value - Neither provides information about how **spread out** or dispersed the data points are around the center
Explanation: ***Mean*** - The **mean** is a measure of **central tendency**, representing the average value of a dataset. - It describes the typical value around which data points cluster, rather than how spread out they are. *Mean deviation* - **Mean deviation** is a measure of **dispersion** that calculates the average of the absolute differences between each data point and the mean of the dataset. - It quantifies the average deviation of data points from the center. *Standard deviation* - **Standard deviation** is a widely used measure of **dispersion** that indicates the average amount of variability or spread around the mean. - A higher standard deviation suggests data points are more spread out from the mean. *Range* - The **range** is a simple measure of **dispersion** calculated as the difference between the highest and lowest values in a dataset. - It provides a basic idea of the total spread of data from its extremes.
Collection and Presentation of Data
Practice Questions
Measures of Central Tendency
Practice Questions
Measures of Dispersion
Practice Questions
Normal Distribution
Practice Questions
Sampling Methods
Practice Questions
Sample Size Calculation
Practice Questions
Hypothesis Testing
Practice Questions
Tests of Significance
Practice Questions
Correlation and Regression
Practice Questions
Survival Analysis
Practice Questions
Multivariate Analysis
Practice Questions
Statistical Software in Research
Practice Questions
Get full access to all questions, explanations, and performance tracking.
Start For Free