Error. The very word resonates with regret and remorse, at least if you happen to be a baseball player, an exam-taker or a quiz-show participant. For statisticians, errors are simply one more thing to keep track of as part of the job description — unless, of course, the statistician's own errors are at issue.
The term margin of error is common in everyday language, including a lot of media articles about scientific topics or opinion polls. It is a way to report the reliability of a value (such as the percentage of adults who favor a particular political candidate). It is based on a number of factors, including the size of the sample taken and the presumed value of the population mean of the variable of interest.
To understand margin of error, you must first have working knowledge of basic statistics, in particular the concept of a normal distribution. As you read, pay special attention to the difference between the mean of a sample and the mean of a large number of these sample means.
Population Statistics: The Basics
If you have a sample of data, like the weights of 500 randomly chosen 15-year-old boys in Sweden, you can compute the mean, or average, by dividing the sum of the individual weights by the number of data points (500). The standard deviation of this sample is a measure of the spread of that data about that mean, showing how widely values (such as weights) tend to cluster.
- What most likely has a greater standard deviation: The average weight in pounds of the aforementioned Swedish boys, or the total years of school they have completed at age 15?
The Central Limit Theorem of statistics states that in any sample taken from a population with a value for a given variable that is normally distributed about a mean, then the average of the means of samples taken from that population will approach the population mean as the number of sample means averages grows toward infinity.
In sample statistics, the mean and standard deviation are represented by x̄ and s, which are true statistics, rather than μ and σ, which are actually parameters and cannot be known with 100 percent certainty. The following example illustrates the difference, which comes into play when computing margins of error.
If you repeatedly sampled the heights of 100 randomly selected women in a large country where the average height of an adult woman is 64.25 inches, with a standard deviation of 2 inches, you might collect successive x̄ values of 63.7, 64.9, 64.5 and so on, with standard deviations s of 1.7, 2.3, 2.2 inches and the like. In each case, μ and σ remain unchanged at 64.25 and 2 inches respectively.
What Is a Confidence Interval?
If you picked a single person at random and gave her a 20-question general science quiz, it would be foolish to use the result as the average for any larger population of test-takers. However, if the population mean score for this quiz happens to be known, then the power of statistics can be used to determine the confidence you can have that a range of values (in this case scores) will contain that single person's score.
A confidence interval is a range of values that corresponds to the expected percentage of such intervals that will contain the value if a large number of such intervals is randomly created, using the same sample sizes from the same larger population. There is always some uncertainly about whether a particular confidence interval less than 100 percent actually contains the true value of the parameter; most of the time, a confidence interval of 95 percent is used.
Example: Assume your quiz-taker scored 22/25 (88 percent), and that the population mean score is 53 percent with a standard deviation of ± 10 percent. Is there a way to know this score relates to the mean in percentile terms, and what the margin of error involved is?
What Are Critical Values?
Critical values are based on normally distributed data, which is the sort that's been discussed here so far. This is data that is symmetrically distributed about a central mean, such as height and weight tend to be. Other population variables, such as age, do not show normal distributions.
Critical values are used to determine confidence intervals. These are based on the principle that population means are actually very, very reliable estimates cobbled together from a practically limitless number of samples. They are denoted by z, and you need a chart like the one in the Resources to work with them because your chosen confidence interval determines their value.
One reason you need z-values (or z-scores) is to determine the margin of error of a sample mean or of a population mean. These calculations are handled in somewhat different ways.
Standard Error vs. Standard Deviation
The standard deviation of a sample s differs for every sample; the standard error of the mean of a number of samples depends on the population standard deviation σ and is given by the expression:
Margin of Error Formula
To continue the above discussion about z-scores, they are derived from the chosen confidence interval. To use the associated table, convert the confidence interval percentage to a decimal, subtract this quantity from 1.0, and divide the result by two (because the confidence interval is symmetrical about the mean).
The quantity (1 − CI), where CI is the confidence interval expressed in decimal notation, is called the level of significance and is denoted by α. For example, when CI = 95% = 0.95, α = 1.0 − 0.05 = 0.05.
Once you have this value, you find where is appears on the z-score table and determine the z-score by noting the values for the relevant row and column. For example, when α = 0.05, you refer to the value 0.05/2 = 0.025 on the table, called Z(α/2), see that it is associated with a z-score of −1.9 (the row value) minus another 0.06 (the column value) to give a z-score of −1.96.
Margin of Error Calculations
Now, you are ready to perform some margin of error calculations. As noted, these are done differently depending on what exactly you are finding the margin of error of.
The formula for the margin of error for a sample mean is:
and that for the margin of error of a population mean is:
Example: Assume you know that the number of online shows people in your city binge-watch per year is normally distributed with a population standard deviation σ of 3.2 shows. A random sample of 29 townsfolk was taken, and the sample mean is 14.6 shows/year. Using a 90% confidence interval, what is the margin of error?
You see that you will use the second of the above two equations to solve this problem, since σ is given. First, compute the standard error σ/√n:
Now, you use the value of Z(α/2) for α = 0.10. Locating the value 0.050 on the table, you see that this corresponds to a value of z between −1.64 and −1.65, so you can use −1.645. For the margin of error E, this gives:
Note that you could have started on the positive z-score side of the table and found the value corresponding to 0.90 instead of 0.10, since this represents the corresponding critical point on the opposite (right) side of the graph. This would have given E = 1.10, which makes sense since the error is the same on each side of the mean.
In summary, then, the number of shows binged per year by the sample of 29 of your neighbors is 14.6 ± 1.10 shows per year.
- OpenStax Introductory Business Statistics: The Central Limit Theorem for Sample Means
- Investopedia.com: Standard Error of the Mean vs. Standard Deviation: The Difference
- Georgia Southern University: Critical Values of z
- LibreTexts Statistics: Confidence Intervals
- Boston University School of Public Health: Variance and Standard Deviation
- Stat Trek: Margin of Error