In statistics, random sampling of data from a population often leads to the production of a bell-shaped curve with the mean centered on the peak of the bell. This is known as a normal distribution. The central limit theorem states that as the number of samples increases, the measured mean tends to be normally distributed about the population mean and the standard deviation becomes narrower. The central limit theorem can be used to estimate the probability of finding a particular value within a population.
- Subtract each data point from the mean.
- Square the result, and sum this value for each point.
- Divide by the total sample number.
- Take the square root.
Collect samples and then determine the mean. For example, assume you want to calculate the probability that a male in the United States has a cholesterol level of 230 milligram per deciliter or above. We would start by collecting samples from 25 individuals and measuring their cholesterol levels. After collecting the data, calculate the mean of the sample. The mean is obtained by summing each measured value and dividing by the total number of samples. In this example, assume that the mean is 211 milligrams per deciliter.
Calculate the standard deviation, which is a measure of the data "spread". This can be done in a few easy steps:
In this example, assume that the standard deviation is 46 milligrams per deciliter.
Calculate the standard error by dividing the standard deviation by the square root of total sample number:
Standard error = 46 / sqrt25 = 9.2
Draw a sketch of the normal distribution and shade in the appropriate probability. Following the example, you want to know the probability that a male has a cholesterol level of 230 milligram per deciliter or above. To find the probability, find out how many standard errors away from the mean 230 milligram per deciliter is (Z-value):
Z = 230 - 211 / 9.2 = 2.07
Look up the probability of obtaining a value 2.07 standard errors above the mean. If you need to find the probability of finding a value within 2.07 standard deviations of the mean, then z is positive. If you need to find the probability of finding a value beyond 2.07 standard deviations of the mean then z is negative.
Look up the z-value on a standard normal probability table. The first column on the left-hand side shows the whole number and first decimal place of the z-value. The row along the top shows the third decimal place of the z-value. Following the example, since our z-value is -2.07, first locate -2.0 in the left-hand column, then scan the top row for the 0.07 entry. The point at which these column and rows intersect is the probability. In this case, the value read off the table is 0.0192 and thus the probability of finding a male that has a cholesterol level of 230 milligram per deciliter or above is 1.92 percent.
Things You'll Need
About the Author
Samuel Markings has been writing for scientific publications for more than 10 years, and has published articles in journals such as "Nature." He is an expert in solid-state physics, and during the day is a researcher at a Russell Group U.K. university.
Marek Uliasz/Hemera/Getty Images