Calculating a sample proportion in probability statistics is straightforward. Not only is such a calculation a handy tool in its own right, but it is also a useful way to illustrate how sample sizes in normal distributions affect the standard deviations of those samples.
Say that a baseball player is batting .300 over a career that includes many thousands of plate appearances, meaning that the probability he will get a base hit any time he faces a pitcher is 0.3. From this, it is possible to determine how close to .300 he will hit in a smaller number of plate appearances.
Definitions and Parameters
For these problems, it is important that the sample sizes be sufficiently large to produce meaningful results. The product of the sample size n and the probability p of the event in question occurring must be greater than or equal to 10, and similarly, the product of the sample size and one minus the probability of the event in occurring must also greater than or equal to 10. In mathematical language, this means that np ≥ 10 and n(1 - p) ≥ 10.
The sample proportion p̂ is simply the number of observed events x divided by the sample size n, or p̂ = (x/n).
Mean and Standard Deviation of the Variable
The mean of x is simply np, the number of elements in the sample multiplied by the probability of the event occurring. The standard deviation of x is √np(1 - p).
Returning to the example of the baseball player, assume he has 100 plate appearances in his first 25 games. What are the mean and standard deviation of the number of hits he is expected to get?
np = (100)(0.3) = 30 and √np(1 - p) = √(100)(0.3)(0.7) = 10 √0.21 = 4.58.
This means that the player getting as few as 25 hits in his 100 plate appearances or as many as 35 would not be considered statistically anomalous.
Mean and Standard Deviation of the Sample Proportion
The mean of any sample proportion p̂ is just p. The standard deviation of p̂ is √p(1 - p)/ √n.
For the baseball player, with 100 tries at the plate, the mean is simply 0.3 and the standard deviation is: √(0.3)(0.7)/ √100, or (√0.21)/ 10, or 0.0458.
Note that the standard deviation of p̂ is far smaller than the standard deviation of x.