Winning the science fair means standing out from the competition.

Don’t get us wrong, creating an awesome baking soda volcano might turn a few heads. But you need to do something a bit more robust than that if you want to take the top prize, whether at your school or for the Google Science Fair.

As well as having a sensible and well-designed experiment, one of the most important things when you’re trying to draw a firm conclusion is analyzing your results accurately. Although you might not want to hear it – this isn’t most people’s *favorite* part of doing science – this means doing some basic statistics to see if any differences you observe are **statistically significant** or possibly just due to chance.

Don’t worry, though, performing statistical tests isn’t really difficult, but it’s one of the best ways to make your project really stand out to the judges.

## Why Use Statistics

If you pick any variable – for example, height, spelling test scores or the number of successfully germinated seeds – there will always be some variation by chance alone. There is generally a distribution of results around some central value. This makes it a little bit difficult to really *know* whether or not an apparent difference between two results is actually important, or just due to this intrinsic variation. That’s what you use statistics for.

Statistical tests like the *t*-test and Pearson’s correlation coefficient give you the tools to separate out the effects of random chance from genuine effects beyond those expected by chance. For example, if you want to know if boys are taller than girls, you wouldn’t just compare the averages (more on that in a moment), you’d need to look at how the differences *within* a group compare to the differences *between* the groups.

## Basic Statistical Measures

To use statistical tests for your science project, you’ll need to know a couple of basic things first. The first is pretty simple: the concept of a “mean,” which is what most people are talking about when they say “average.” This is simply the sum of a set of values divided by the number of values. So if you have five test scores: 20, 13, 18, 22 and 16, the mean is:

The other important concept is the **standard deviation**. This is a measure of the spread of values around the mean, and it’s used as part of many statistical tests. The formula for standard deviation is:

This might look scary, but it’s pretty easy to calculate: start by working out the mean *μ*, and then subtract this value from each of the individual results (the *x*_{i} in the equation), before squaring the answer. Now sum up all of these individual values, divide by the number of results (*N*), and finally take the square root of the answer.

## Testing for a Difference: The t-Test

If you want to test for a difference in a certain variable between two groups – for example, the average height of boys vs. girls or test scores of students who’ve taken a recap course vs. those who haven’t – the *t*-test is one of the most commonly used statistical tests. It assumes that your data is normally distributed (like a bell curve – it probably will be, so you don’t have to worry about this too much), that the squares of the standard deviations (the “variance”) of each group is the same and that the observations are independent of each other.

To perform a *t*-test, you use the formula:

Now, all you need to know is what each of the symbols means. Firstly, the *μ* symbols are the means for the samples, the *n* values are the number of results in each group, and the *s*_{p} values involve the standard deviations of the samples. This is a little more complicated and has a separate formula:

It’s generally easier to calculate this in pieces, starting with the *s*_{p}^{2} value, and then put the value into the equation for *t*. The final step is looking up the result you get for *t* in a table (see Resources) for the appropriate significance level, which is usually 0.95 (if you’re testing for a difference in both directions, i.e. higher and lower, then either use a table for “two-sided” test or use the 0.975 value). You need to check the row for your number of degrees of freedom (your total sample size minus 2), and if your *t* value (ignoring any minus signs) is higher than the value in the table, you have found a significant difference.

Of course, this is really just the beginning: What do you do with the result when you’ve found it? The next part of this article will go in depth about interpreting your results.

References

Resources

About the Author

Lee Johnson is a freelance writer and science enthusiast, with a passion for distilling complex concepts into simple, digestible language. He's written about science for several websites including eHow UK and WiseGeek, mainly covering physics and astronomy. He was also a science blogger for Elements Behavioral Health's blog network for five years. He studied physics at the Open University and graduated in 2018.