Interpret the following graph.

A study is performed to assess the intelligence quotient and the crime rate in a neighborhood. Students at a local high school are given an assessment and their criminal and disciplinary records are reviewed. One of the subjects scores 2 standard deviations over the mean. What percent of students did he score higher than?
The principal investigators of both studies recently met at a rheumatology conference. They both expressed an interest in combining data from their individual studies to be analyzed in a single study. A third researcher at the conference, who conducted her own project on the same topic recently, has also indicated she would like to contribute data to a pooled analysis. Which of the following statements regarding their new study design is true?
Which of the following is not a measure of dispersion?
A cohort study follows 500 healthcare workers for 4 years to assess the incidence of occupational tuberculosis. During the study period, 20 workers developed tuberculosis. What is the incidence rate of tuberculosis per 1000 person-years in this cohort?
Accidents happening during weekends is an example of -
Mean bone density amongst 2 groups of 50 people each is compared, which would be the best test:
You have diagnosed a patient clinically as having SLE and ordered 6 tests out of which 4 tests have come positive and 2 are negative. Which of the following values are required to determine the probability of SLE at this point?
For a positively skewed curve, which measure of central tendency is largest?
Most appropriate measure for central tendency when data includes extreme values?
Explanation: ### ***Normal, positively skewed, negatively skewed, normal with outliers*** - Boxplot 1 shows a relatively symmetric distribution with the median line close to the center of the box and whiskers of similar length, indicating a **normal distribution**. - Boxplot 2 has its median shifted towards the lower quartile and a longer whisker/tail on the right side, characteristic of a **positively skewed (right-skewed) distribution**. - Boxplot 3 has its median shifted towards the upper quartile and a longer whisker/tail on the left side, indicating a **negatively skewed (left-skewed) distribution**. - Boxplot 4 shows a relatively symmetric distribution, but with individual data points (represented by dots) extending beyond the whiskers, which are considered **outliers** in an otherwise **normal distribution**. ### *Normal, negatively skewed, positively skewed, skewed with outliers* - This option incorrectly identifies the skewness for plots 2 and 3. Plot 2 is positively skewed, not negatively, and plot 3 is negatively skewed, not positively. - While plot 4 does have outliers, referring to it simply as "skewed with outliers" is less precise when its central distribution appears normal. ### *Skewed with outliers, positively skewed, negatively skewed, normal* - This option incorrectly identifies plot 1 as "skewed with outliers" when it appears normal. - It also incorrectly reverses the descriptions for plot 2 (positively skewed) and plot 4 (normal with outliers). ### *Normal, negatively skewed, positively skewed, normal with outliers* - This option incorrectly identifies the skewness for plot 2, labeling it as negatively skewed instead of positively skewed. - It also incorrectly labels plot 3 as positively skewed, when it is negatively skewed.
Explanation: ***97.5%*** - This question relates to the **normal distribution (bell curve) and the empirical rule (68-95-99.7 rule)** [1]. - A score 2 standard deviations above the mean means that 95% of the data falls within +/- 2 standard deviations of the mean [2]. This leaves 5% outside of this range (2.5% on each tail). Therefore, the student scored higher than 95% + 2.5% = **97.5%** of students. *95%* - This percentage represents the data that falls within **2 standard deviations of the mean (both sides)**, not the percentage a score 2 standard deviations above the mean is higher than [1]. - It would be correct if the question asked for the percentage of students whose scores fall within two standard deviations of the mean. *68%* - This percentage represents the data that falls within **1 standard deviation of the mean** according to the empirical rule [1]. - A score 2 standard deviations above the mean is significantly higher than this range. *99.7%* - This percentage represents the data that falls within **3 standard deviations of the mean (both sides)**, according to the empirical rule [2]. - This would mean the student scored 3 standard deviations above the mean, which is not stated in the question.
Explanation: ***The results are more precise in comparison to individual studies*** - Combining data from multiple studies in a **pooled analysis** or meta-analysis generally increases the sample size, leading to **narrower confidence intervals** and more precise estimates of treatment effects or associations. - Increased precision is a key advantage, making it more likely to detect a true effect if one exists, and providing a more stable estimate of that effect. *It overcomes limitations in the quality of individual studies* - A pooled analysis or meta-analysis **does not inherently improve the methodological quality** of the individual studies included. If individual studies have significant biases or design flaws, these flaws will likely be carried over into the combined analysis. - The quality of the pooled results is highly dependent on the quality of the contributing studies, often making a **sensitivity analysis** based on quality a crucial step. *It is unable to resolve differences in outcomes between individual studies* - One of the primary goals of a meta-analysis is to **investigate and explain heterogeneity** (differences in outcomes) among individual studies through subgroup analyses or meta-regression, providing insights into variations. - By exploring factors that might explain differing results, such as patient characteristics, intervention specifics, or study designs, it can **identify reasons for disparate findings**. *It has a lower level of clinical evidence than an individual cohort study* - Pooled analyses and **meta-analyses of high-quality studies**, especially randomized controlled trials (RCTs), are generally considered a **higher level of evidence** than individual cohort studies. - By synthesizing evidence from multiple studies, they provide a more comprehensive and robust estimate of an effect, thus ranking higher in most **hierarchies of evidence**.
Explanation: ***Mean*** - The **mean** is a measure of **central tendency**, representing the average value of a dataset. - It describes where the center of the data lies, not how spread out the data points are. *Range* - The **range** is a measure of **dispersion** that indicates the difference between the **maximum** and **minimum** values in a dataset. - It quantifies the overall spread of the data from its lowest to highest points. *Variance* - **Variance** is a measure of **dispersion** that quantifies the **average squared deviation** of each data point from the mean. - It provides insight into how much the individual data points in a distribution deviate from the central tendency. *Standard error* - The **standard error** measures the **precision and sampling variability** of a sample statistic (e.g., sample mean) as an estimate of the population parameter. - While it relates to variability, it specifically quantifies how much a sample statistic varies across different samples, rather than measuring the dispersion of individual observations within a dataset. - In the context of this question, it is considered a measure related to dispersion, though technically it measures sampling variability.
Explanation: ***10 per 1000 person-years*** - The **incidence rate** is calculated by dividing the number of new cases by the total person-time at risk in the population. - Total person-years = 500 workers × 4 years = **2000 person-years** - Incidence rate = 20 cases / 2000 person-years = **0.01 per person-year** - To express this per 1000 person-years: 0.01 × 1000 = **10 per 1000 person-years** - This is the correct calculation following the standard epidemiological formula for incidence rate. *5 per 1000 person-years* - This value would be obtained if the total person-years at risk were 4000 (e.g., 500 workers followed for 8 years instead of 4 years). - It underestimates the true incidence rate by using an incorrect denominator. *7.5 per 1000 person-years* - This result would occur if the person-years at risk were approximately 2667 person-years (20/2667 × 1000 = 7.5). - This reflects an incorrect calculation of the **denominator** (person-years at risk). *12.5 per 1000 person-years* - This value incorrectly assumes a denominator of 1600 person-years (20/1600 × 1000 = 12.5). - This could result from miscalculating the total follow-up time or the number of participants, leading to an overestimation of the incidence rate.
Explanation: ***Cyclic trends*** - Accidents happening during weekends represent a **regular, recurrent pattern** over a short period (weekly), which is characteristic of a cyclic trend. - These trends show peaks and troughs that occur at **predictable intervals**, such as every week or month. *Point source epidemic* - A **point source epidemic** refers to an outbreak where exposure to the causative agent is brief and simultaneous, resulting in a sharp rise and fall in cases, often from a single event or source. - This typically describes disease outbreaks following a contamination event, not recurring patterns of accidents over weekends. *Secular trends* - **Secular trends** describe long-term changes over many years or decades, showing a gradual increase or decrease in prevalence or incidence. - This concept is used for gradual shifts in health indicators over long periods, not for short-term weekly fluctuations. *Seasonal trends* - **Seasonal trends** refer to patterns that recur annually, often linked to changes in seasons, such as influenza outbreaks in winter or agricultural accidents in summer. - While weekends are a recurring interval, the pattern is weekly, not yearly, which distinguishes it from seasonal trends.
Explanation: ***Student t-test*** - The **Student's t-test** is the appropriate statistical test for comparing the **means of two independent groups** when the data is continuous and normally distributed. - Bone density is a **continuous variable**, and the scenario involves comparing the mean bone density between two distinct groups. *Fisher exact test* - The **Fisher exact test** is used for analyzing **categorical data** in a 2×2 contingency table, especially when sample sizes are small. - It is not suitable for comparing continuous variables like bone density. *McNemar test* - **McNemar's test** is used to analyze paired nominal data, typically when comparing two related proportions from the same subjects before and after an intervention. - This scenario involves **independent groups**, not paired data. *Chi-square test* - The **chi-square test** is primarily used to compare **categorical variables** to see if there is a significant association between them. - It's not appropriate for comparing the means of continuous data like bone density.
Explanation: ***Prior probability of SLE, sensitivity and specificity of each test*** - To determine the **post-test probability** of a disease like SLE, you need the **prior probability** (pre-test probability) of the disease in the patient. - Additionally, the **sensitivity** (true positive rate) and **specificity** (true negative rate) of *each* diagnostic test are crucial for calculating how much each positive or negative test result alters that prior probability, often using **Bayes' theorem**. *Relative risk of SLE in the patient* - **Relative risk** is a measure of association between exposure and disease, typically used in **epidemiological studies** to compare risk in exposed vs. unexposed groups. - It does not directly help determine an individual patient's post-test probability of SLE based on their specific test results. *Incidence and prevalence of SLE* - **Incidence** refers to the rate of new cases in a population over a specific period, while **prevalence** refers to the proportion of individuals in a population who have the disease at a specific time. - While prevalence can contribute to the **prior probability** for a general population, it's not sufficient on its own, nor does it incorporate the results of individual diagnostic tests. *Incidence of SLE and the predictive value of each test* - Although **predictive values (positive and negative)** are important for interpreting test results, they are *derived from* sensitivity, specificity, and prevalence. - To *determine* the probability of SLE using multiple tests, you need the fundamental properties of the tests (sensitivity and specificity) and the prior probability, rather than just the incidence and already-calculated predictive values.
Explanation: ***Mean*** - In a **positively skewed distribution**, the tail of the distribution extends towards higher values, pulling the **mean** in that direction, making it the largest among the three measures of central tendency. - The presence of **outliers** with large values in the tail disproportionately increases the mean. *Mode* - The **mode** represents the most frequently occurring value in the data set. - In a positively skewed distribution, the mode will be located at the **peak of the distribution**, which is typically the smallest value among the three measures of central tendency. *All are equal* - This statement is characteristic of a **perfectly symmetrical distribution** (e.g., a normal distribution), where the **mean, median, and mode** are all equal. - A positively skewed curve is asymmetrical, meaning these measures will not be equal. *Median* - The **median** is the middle value in an ordered data set, dividing the data into two equal halves. - In a positively skewed distribution, the median will be shifted towards the right of the mode but will still be to the left of the mean, meaning it is **smaller than the mean**.
Explanation: ***Median*** - The **median** is less affected by **extreme values** or **outliers** because it represents the middle value in an ordered dataset. - It provides a more robust measure of central tendency when the data distribution is **skewed**. *Mode* - The **mode** represents the most frequently occurring value in a dataset; it does not account for the magnitude of other values. - While it is not influenced by extreme values, it may not accurately represent the central tendency of a continuous dataset, especially if there are **multiple modes** or if the most frequent value is not central. *Mean* - The **mean** is calculated by summing all values and dividing by the number of values, making it highly susceptible to **extreme values** or **outliers**. - A single very large or very small value can significantly distort the mean, pulling it away from the true center of most data points. *Geometric mean* - The **geometric mean** is primarily used for data that is **multiplicative** in nature or when dealing with rates of change, or positively skewed distributions. - While it can be less sensitive to extreme values than the arithmetic mean for certain types of data, it is not the most appropriate general measure for central tendency when outliers are present without specific multiplicative contexts.
Collection and Presentation of Data
Practice Questions
Measures of Central Tendency
Practice Questions
Measures of Dispersion
Practice Questions
Normal Distribution
Practice Questions
Sampling Methods
Practice Questions
Sample Size Calculation
Practice Questions
Hypothesis Testing
Practice Questions
Tests of Significance
Practice Questions
Correlation and Regression
Practice Questions
Survival Analysis
Practice Questions
Multivariate Analysis
Practice Questions
Statistical Software in Research
Practice Questions
Get full access to all questions, explanations, and performance tracking.
Start For Free