This study found a correlation coefficient of +0.7 between self-reported work satisfaction and life expectancy in a random sample of 5,000 corporate workers, with a p-value of 0.01. This means that:
For calculation of sample size for a prevalence study, all of the following are necessary except:
What does specificity in a diagnostic test measure?
Which of the following is not a method of random sampling?
After applying a statistical test, an investigator gets a p-value of 0.01. What does this indicate about the null hypothesis?
Which of the following statements about the normal distribution curve is true?
What is the definition of literacy according to census standards?
In the context of medical statistics, which graphical representation is used to show the proportion of different categories of a variable in a dataset?
Which of the following is obtained by joining the midpoints of histogram blocks in statistics?
Which of the following statements about screening tests is correct?
Explanation: ***Strong statistically significant (+) association between work satisfaction and life expectancy.*** - A **correlation coefficient** of **+0.7** indicates a strong positive linear relationship between two variables. - A **p-value of 0.01** (which is less than 0.05) indicates that the observed association is **statistically significant**, meaning it's unlikely to have occurred by chance. *Correlation does not imply that 70% of people who enjoy work shall live longer.* - A **correlation coefficient** is a measure of the strength and direction of a linear relationship, not a percentage of a population. - Saying "70% of people" implies a proportional relationship, which is an incorrect interpretation of a correlation coefficient. *Correlation coefficient of +0.7 indicates a moderate positive relationship, not a percentage.* - A correlation coefficient of **+0.7** is generally considered a **strong positive relationship**, rather than moderate. - This statement correctly clarifies that a correlation coefficient is not a percentage, but mischaracterizes the strength of the given correlation. *Work satisfaction is moderately associated with life expectancy.* - A **correlation coefficient of +0.7** signifies a **strong positive association**, not a moderate one. - The term "moderately" underestimates the strength of the relationship indicated by a correlation coefficient of 0.7.
Explanation: ***Power of the study*** - The **power of a study** is primarily relevant when calculating sample sizes for **hypothesis testing** (e.g., comparing two groups) to detect a statistically significant difference if one exists. - In a prevalence study, the goal is to estimate a proportion or prevalence with a certain level of precision, rather than to test a hypothesis. *Prevalence of disease in population* - An **estimated prevalence** is crucial for sample size calculation in prevalence studies, as it directly influences the variability of the proportion being estimated. - A higher or lower estimated prevalence affects the required sample size to achieve a desired level of precision. *Significance level* - The **significance level (alpha)** defines the probability of rejecting the null hypothesis when it is true (Type I error). - While essential for hypothesis testing, it is still used in prevalence studies to define the **confidence level** for the estimated prevalence (e.g., 95% confidence interval corresponds to an alpha of 0.05). *Desired precision* - **Desired precision**, often expressed as the **margin of error**, is a fundamental component of sample size calculation for prevalence studies. - It specifies how close the sample estimate should be to the true population prevalence.
Explanation: ***True negative*** - Specificity measures the **proportion of true negatives** correctly identified by the test. - It indicates the test's ability to correctly identify individuals **without the disease** who test negative. - **Formula: Specificity = TN / (TN + FP)** where TN = True Negatives, FP = False Positives. *True positive* - **True positives** are measured by **sensitivity**, not specificity. - Sensitivity measures the proportion of people with the disease who test positive. *False positive* - **False positives** reduce specificity but are not what specificity measures. - High specificity means fewer false positives (more specific for the disease). *False negative* - **False negatives** are related to **sensitivity**, not specificity. - A test with low sensitivity will have a higher rate of false negatives.
Explanation: ***Quota sampling*** - **Quota sampling** is a non-probability sampling method where researchers select a sample based on pre-defined characteristics to match the population's proportions. - It does not involve random selection at any stage, making it a non-random sampling technique. *Cluster sampling* - **Cluster sampling** is a probability (random) sampling technique where the population is divided into clusters, and then a random sample of these clusters is selected. - All units within the selected clusters are then included in the sample, or a random sample is taken from within the selected clusters. *Stratified sampling* - **Stratified sampling** is a probability (random) sampling method that involves dividing the population into homogeneous subgroups (strata) and then taking a random sample from each stratum. - This method ensures representation from all important subgroups within the population. *Simple random* - **Simple random sampling** is a basic probability (random) sampling technique where every member of the population has an equal chance of being selected for the sample. - This method is considered the most fundamental type of random sampling.
Explanation: ***There is a 1% probability of observing the data, or something more extreme, if the null hypothesis is true.*** - A **p-value** is defined as the probability of obtaining observed results (or results more extreme) assuming that the **null hypothesis is true**. - A p-value of 0.01 means there is a **1% chance** of observing the data if there truly is no effect or no difference. *There is a 1% probability of incorrectly rejecting the null hypothesis when it is true.* - This statement describes the **Type I error rate (alpha level)**, which is typically set *before* the experiment, usually at 0.05 or 0.01. - While a low p-value suggests the possibility of a Type I error if the null hypothesis is rejected, it doesn't directly represent the probability of making *that specific error*. *The null hypothesis is likely to be rejected.* - A p-value of 0.01 is **statistically significant** at common alpha levels (e.g., 0.05 or 0.01), leading to the rejection of the null hypothesis. However, this option is about the *action* taken, not the *interpretation* of the p-value itself. - The decision to reject or not reject depends on comparing the p-value to a pre-defined **alpha level**. *The test has a 99% chance of detecting a true effect if it exists.* - This statement describes the **power of the study (1 - beta)**, which is the probability of correctly rejecting a false null hypothesis. - Power is a separate concept from the p-value and is influenced by factors like sample size, effect size, and alpha level.
Explanation: ***Mean = Median*** - In a **normal distribution curve**, the data is perfectly symmetrical around its center. - This symmetry ensures that the **mean, median, and mode** all coincide at the peak of the curve. - This is a defining characteristic of the **Gaussian (normal) distribution**. *Mean = 2 Median* - This statement is incorrect; in a **normal distribution**, the mean and median are equal, not a multiple of each other. - Such a relationship (Mean = 2 Median) would imply a **positively skewed distribution**, which is not characteristic of a normal distribution. *Median = Variance* - The **median** is a measure of **central tendency**, representing the middle value of the data set. - **Variance** is a measure of **data dispersion** (how spread out the data is), measured in squared units. - These two measures are fundamentally different concepts and generally not equal. *Standard Deviation = 2 Variance* - **Standard deviation** is the **square root of the variance** (SD = √Variance), not twice the variance. - This relationship is mathematically incorrect and does not hold true for any distribution.
Explanation: ***Ability to read and write*** - According to most census standards, **literacy** is fundamentally defined as the ability of an individual to **read and write a simple message** in any language. - This definition focuses on the basic functional capacity to engage with written communication, rather than advanced proficiency. *Participation in a literacy program* - While participating in a literacy program indicates an effort towards improving literacy, it does not, by itself, define the current **literacy status** according to census standards. - An individual might attend such a program without yet acquiring the functional ability to **read and write**. *Ability to read and write fluently* - **Fluency** implies a high level of proficiency and speed in reading and writing, which goes beyond the basic definition of literacy used in census data collection. - Census standards typically only require the **basic capacity** to read and write. *Ability to write a simple sentence* - This option only covers the **writing aspect** of literacy and omits the crucial component of being able to **read**. - Census definitions require both reading and writing capabilities to be considered literate.
Explanation: ***Option A: Pie*** - A **pie chart** is ideal for displaying **proportions** of different categories within a whole, where each slice represents a percentage of the total. - It clearly shows how each category contributes to the **overall dataset**, making it easy to visualize relative frequencies and the **part-to-whole relationship**. *Option B: Bar* - A **bar graph** is typically used to compare the **magnitudes** or frequencies of different categorical variables, rather than their proportion of a whole. - While it can show counts for categories, it doesn't directly represent the **part-to-whole relationship** as effectively as a pie chart. *Option C: Histogram* - A **histogram** is used to represent the **distribution of continuous numerical data**, grouping values into bins and displaying their frequencies. - It is not suitable for showing proportions of different **categorical variables**. *Option D: Pictogram* - A **pictogram** uses **pictures or symbols** to represent data, often in a simplified and engaging way. - While it can represent frequencies or counts, it is less precise for showing exact **proportions** of categories compared to a pie chart.
Explanation: ***Frequency polygon*** - A **frequency polygon** is constructed by plotting a point at the midpoint of the top of each histogram bar and then connecting these points with straight lines. - It is used to display the **frequency distribution** of continuous data, similar to a histogram, but can also compare multiple distributions on one graph. *Pictogram* - A **pictogram** uses images or symbols to represent data, where each symbol represents a certain quantity. - It simplifies data for broader audiences but is not derived directly from histogram blocks. *Bar chart* - A **bar chart** uses rectangular bars of varying heights or lengths to represent data for different categories. - Unlike a histogram, bar charts typically represent **categorical data** and have gaps between bars. *Pie chart* - A **pie chart** is a circular statistical graphic divided into slices to illustrate numerical proportion. - Each slice represents a category's proportion of the whole and is not related to histogram blocks or their midpoints.
Explanation: ***Sensitivity is 1 - False negative rate*** - **Sensitivity** refers to the proportion of **true positive results** among all individuals with the disease. - The **false negative rate** is the proportion of individuals with the disease who test negative, so **1 - false negative rate** correctly defines sensitivity. *Sensitivity is 1 - False positive rate* - The false positive rate (1 - specificity) is related to the proportion of individuals without the disease who test positive. - This statement incorrectly defines sensitivity, confusing it with concepts related to specificity. *Post-test probability is only influenced by pre-test probability* - **Post-test probability** is influenced by both the **pre-test probability** and the **likelihood ratio** of the diagnostic test. - The **likelihood ratio** incorporates the test's sensitivity and specificity, making it a critical factor in modifying the probability of disease after testing. *None of the options is correct.* - The first statement, "Sensitivity is 1 - False negative rate," is a correct definition of sensitivity.
Collection and Presentation of Data
Practice Questions
Measures of Central Tendency
Practice Questions
Measures of Dispersion
Practice Questions
Normal Distribution
Practice Questions
Sampling Methods
Practice Questions
Sample Size Calculation
Practice Questions
Hypothesis Testing
Practice Questions
Tests of Significance
Practice Questions
Correlation and Regression
Practice Questions
Survival Analysis
Practice Questions
Multivariate Analysis
Practice Questions
Statistical Software in Research
Practice Questions
Get full access to all questions, explanations, and performance tracking.
Start For Free