Chi-squared, more properly known as Pearson's chi-square test, is a means of statistically evaluating data. It is used when categorical data from a sampling are being compared to expected or "true" results. For example, if we believe 50 percent of all jelly beans in a bin are red, a sample of 100 beans from that bin should contain approximately 50 that are red. If our number differs from 50, Pearson's test tells us if our 50 percent assumption is suspect, or if we can attribute the difference we saw to normal random variation.
Interpreting Chi-Square Values
- Table of chi square distribution values
- Chi square test statistic for your data
Remember that any conclusion made based on this test will still have a chance of being wrong, proportionate to the p value obtained.
The value obtained for each category in the sample should be at least 5 for results to be valid.
Determine the degrees of freedom of your chi-square value. If you are comparing results for a single sample with multiple categories, the degrees of freedom is the number of categories minus 1. For example, if you were evaluating the distribution of colors in a jar of jellybeans and there were four colors, the degrees of freedom would be 3. If you are comparing tabular data the degrees of freedom equals the number of rows minus 1 multiplied by the number of columns minus 1.
Determine the critical p value that you will use to evaluate your data. This is the percent probability (divided by 100) that a specific chi-square value was obtained by chance alone. Another way of thinking about p is that it is the probability that your observed results deviated from the expected results by the amount that they did solely due to random variation in the sampling process.
Look up the p value associated with your chi-square test statistic using the chi-square distribution table. To do this, look along the row corresponding to your calculated degrees of freedom. Find the value in this row closest to your test statistic. Follow the column that contains that value upwards to the top row and read off the p value. If your test statistic is in between two values in the initial row, you can read off an approximate p value intermediate between two p values in the top row.
Compare the p value obtained from the table to the critical p value earlier decided upon. If your tabular p value is above the critical value, you will conclude that any deviation between the sample category values and the expected values was due to random variation and was not significant. For example, if you chose a critical p value of 0.05 (or 5%) and found a tabular value of 0.20, you would conclude there was no significant variation.
Things You'll Need
- Remember that any conclusion made based on this test will still have a chance of being wrong, proportionate to the p value obtained.
- The value obtained for each category in the sample should be at least 5 for results to be valid.
About the Author
Michael Judge has been writing for over a decade and has been published in "The Globe and Mail" (Canada's national newspaper) and the U.K. magazine "New Scientist." He holds a Master of Science from the University of Waterloo. Michael has worked for an aerospace firm where he was in charge of rocket propellant formulation and is now a college instructor.