Experiments test predictions. These predictions are often numerical, meaning that, as scientists gather data, they expect the numbers to break down in a certain way. Real-world data rarely match exactly the predictions scientists make, so scientists need a test to tell them whether the difference between observed and expected numbers is because of random chance, or because of some unforeseen factor that will force the scientist to adjust the underlying theory. A chi-square test is a statistical tool that scientists use for this purpose.

## The Type of Data Required

You need categorical data to use a chi-square test. An example of categorical data is the number of people who answered a question "yes" versus the number of people who answered the question "no" (two categories), or the numbers of frogs in a population that are green, yellow or gray (three categories). You cannot use a chi-square test on continuous data, such as might be collected from a survey asking people how tall they are. From such a survey, you would get a broad range of heights. However, if you divided the heights into categories such as "under 6 feet tall" and "6 feet tall and over," you could then use a chi-square test on the data.

## The Goodness-of-Fit Test

A goodness-of-fit test is a common, and perhaps the simplest, test performed using the chi-square statistic. In a goodness-of-fit test, the scientist makes a specific prediction about the numbers she expects to see in each category of her data. She then collects real-world data -- called observed data -- and uses the chi-square test to see whether the observed data match her expectations.

## Sciencing Video Vault

For example, imagine a biologist is studying the inheritance patterns in a species of frog. Among 100 offspring of a set of frog parents, the biologist's genetic model leads her to expect 25 yellow offspring, 50 green offspring and 25 gray offspring. What she actually observes is 20 yellow offspring, 52 green offspring and 28 gray offspring. Is her prediction supported or is her genetic model incorrect? She can use a chi-square test to find out.

## Calculating the Chi-Square Statistic

Begin calculating the chi-square statistic by subtracting each expected value from its corresponding observed value and squaring each result. The calculation for the example of the frog offspring would look like this:

yellow = (20 - 25)^2 = 25 green = (52 - 50)^2 = 4 gray = (28 - 25)^2 = 9

Now divide each result by its corresponding expected value.

yellow = 25 ÷ 25 = 1 green = 4 ÷ 50 = 0.08 gray = 9 ÷ 25 = 0.36

Finally, add together the answers from the previous step.

chi-square = 1 + 0.08 + 0.36 = 1.44

## Interpreting the Chi-Square Statistic

The chi-square statistic tells you how different your observed values were from your predicted values. The higher the number, the greater the difference. You can determine whether your chi-square value is too high or low enough to support your prediction by seeing whether it is below a certain **critical value** on a chi-square distribution table. This table matches chi-square values with probabilities, called **p-values**. Specifically, the table tells you the probability that the differences between your observed and expected values are simply due to random chance or whether some other factor is present. For a goodness-of-fit test, if the p-value is 0.05 or less, then you must reject your prediction.

You must determine the **degrees of freedom** (df) in your data before you can look up the critical chi-square value in a distribution table. Degrees of freedom are calculated by subtracting 1 from the number of categories in your data. There are three categories in this example, so there are 2 degrees of freedom. A glance at this chi-square distribution table tells you that, for 2 degrees of freedom, the critical value for a 0.05 probability is 5.99. This means that as long as your calculated chi-square value is less than 5.99, your expected values, and thus the underlying theory, are valid and supported. Since the chi-square statistic for the frog offspring data was 1.44, the biologist can accept her genetic model.