Statistical significance is an objective indicator of whether or not the results of a study are mathematically "real" and statistically defensible, rather than just a chance occurrence. Commonly used significance tests look for differences in the means of data sets or differences in the variances of data sets. The type of test that is applied depends on the type of data that is being analyzed. It is up to the researchers to determine just how significant they require the results to be -- in other words, how much risk they are willing to take of being wrong. Typically, researchers are willing to accept a risk level of 5 percent.
Type I Error: Wrongly Rejecting the Null Hypothesis
Experiments are conducted to test out specific hypotheses, or experimental questions with an expected result. A null hypothesis is one that detects no difference between the two sets of data being compared. In a medical trial, for example, the null hypothesis might be that there is no difference in improvement between patients receiving the study drug and patients receiving the placebo. If the researcher wrongly rejects this null hypothesis when it is in fact true, in other words if they "detect" a difference between the two sets of patients when there really was no difference, then they have committed a Type I error. Researchers determine ahead of time how much risk of committing a Type I error they are willing to accept. This risk is based on a maximum p-value that they will accept before rejecting the null hypothesis, and is called alpha.
Type II Error: Wrongly Rejecting the Alternate Hypothesis
An alternate hypothesis is one that detects a difference between the two sets of data being compared. In the case of the medical trial, you would expect to see different levels of improvements in patients receiving the study drug and patients receiving the placebo. If researchers fail to reject the null hypothesis when they should, in other words if they "detect" no difference between the two set of patients when there really was a difference, then they have committed a Type II error.
Determining the Level of Significance
When researchers perform a test of statistical significance and the resulting p-value is less than the level of risk deemed acceptable, then the test result is considered statistically significant. In this case, the null hypothesis -- the hypothesis that there is no difference between the two groups -- is rejected. In other words, the results indicate that there is a difference in improvement between patients receiving the study drug and patients receiving the placebo.
Choosing a Significance Test
There are several different statistical tests to choose from. A standard t-test compares the means from two data sets, such as our study drug data and our placebo data. A paired t-test is used for detecting differences in the same data set, such as a before-and-after study. A one-way Analysis of Variance (ANOVA) can compare the means from three or more data sets, and a two-way ANOVA compares the means of two or more data sets in response to two different independent variables, such as different strengths of the study drug. A linear regression compares the means of the data sets along a gradient of treatments or time. Each statistical test will result in measures of significance, or alpha, that can be used to interpret the test results.
- amanaimagesRF/amana images/Getty Images