Statistical tests such as the t-test intrinsically depend on the concept of a standard deviation. Any student in statistics or science will use standard deviations regularly and will need to understand what it means and how to find it from a set of data. Thankfully, the only thing you need is the original data, and while the calculations can be tedious when you have a lot of data, in these cases you should use functions or spreadsheet data to do it automatically. However, all you need to do to understand the key concept is to see a basic example you can easily work out by hand. At its core, the sample standard deviation measures how much the quantity you’ve chosen varies across the whole population based on your sample.
TL;DR (Too Long; Didn't Read)
Using n to mean sample size, μ for the mean of the data, xi for each individual data point (from i = 1 to i = n), and Σ as a summation sign, the sample variance (s2) is:
s2 = (Σ xi – μ)2 / (n − 1)
And the sample standard deviation is:
s = √s2
Standard Deviation vs. Sample Standard Deviation
Statistics revolves around making estimates for whole populations based on smaller samples from the population, and accounting for any uncertainty in the estimate in the process. Standard deviations quantify the amount of variation in the population you’re studying. If you’re trying to find the average height, you will get a cluster of results around the mean (the average) value, and the standard deviation describes the width of the cluster and the distribution of heights across the population.
The “sample” standard deviation estimates the true standard deviation for the whole population based on a small sample from the population. Most of the time, you won’t be able to sample the whole population in question, so the sample standard deviation is often the right version to use.
Finding the Sample Standard Deviation
You need your results and the number (n) of people in your sample. First, calculate the mean of the results (μ) by adding up all of the individual results and then dividing this by the number of measurements.
As an example, the heart rates (in beats per minute) of five men and five women are:
71, 83, 63, 70, 75, 69, 62, 75, 66, 68
Which leads to a mean of:
The next stage is to subtract the mean from each individual measurement, and then square the result. As an example, for the first data point:
And for the second:
You continue in this fashion through the data, and then add these results up. So for the example data, the sum of these values is:
The next stage distinguishes between the sample standard deviation and the population standard deviation. For the sample deviation, you divide this result by the sample size minus one (n −1). In our example, n = 10, so n – 1 = 9.
This result gives the sample variance, denoted by s2, which for the example is:
The sample standard deviation (s) is just the positive square root of this number:
If you were calculating the population standard deviation (σ) the only difference is that you divide by n rather than n −1.
The whole formula for sample standard deviation can be expressed using the summation symbol Σ, with the sum being over the whole sample, and xi representing the ith result out of n. The sample variance is:
And the sample standard deviation is simply:
Mean Deviation vs. Standard Deviation
The mean deviation differs slightly from the standard deviation. Instead of squaring the differences between the mean and each value, you instead just take the absolute difference (ignoring any minus signs), and then find the average of those. For the example in the previous section, the first and second data points (71 and 83) give:
The third data point gives a negative result
But you just remove the minus sign and take this as 7.2.
The sum of all of these gives divided by n gives the mean deviation. In the example:
This differs substantially from the standard deviation calculated before, because it doesn’t involve squares and roots.