For sports fans, March Madness is one of the highlights of the year. Starting in the middle of March, the annual event pits the best teams in NCAA college basketball against each other, in a huge knockout tournament consisting of 64 teams.
This is where things get interesting. The knockout aspect means there’s always a chance for upsets and unexpected glory. Who is going to win the tournament? Will there be upsets as a “Cinderella” team progresses further than you’d expect, or will they all crash out in the early rounds? Can you predict the whole bracket?
To look any deeper, we’re going to have to use some math, and learn about how statistics apply to March Madness.
ICYMI: Check out Sciencing's guide to 2019 March Madness, complete with statistics to help you fill out a winning bracket.
The Basics of Probabilities
Before we get into the application of statistics and probability to March Madness, it’s important to cover the basics of probabilities.
The probability of something occurring is simply:
This only applies to any situation with equally likely possible outcomes. So for example, a throw of a standard six-sided die has a 1/6 probability of turning up the number six, because there is only one outcome you want and six possible outcomes. Probabilities are always numbers (expressed as fractions or decimals) between 0 and 1, with 0 meaning no chance whatsoever of the event happening and 1 meaning that it is a certainty.
But if you’re considering something more complicated, like a game of basketball, there is a lot more to think about. You could say the odds of any team winning against any other are 1/2, but a game between Duke and Pittsburgh is hardly a coin-flip. This is where the NCAA’s seeding system and statistics come into play.
March Madness Probabilities
So how do you tackle the problem of applying probability to March Madness? First, you need some way of looking at the actual likelihood that any one team will beat another. This is a very challenging task, but the seeding system is devised by the NCAA essentially separates the teams into “tiers” based on how good they are.
For example, in games since 1985 where a No. 1 seed has played a No. 16 seed, the No. 1 seed has won 99 percent of the time. Meaning, out of any 100 games (because percent is “per hundred”), you can expect the No. 16 seed to win in one of them.
Look at the basic formula again:
Out of 100 possible “win” outcomes, there has only been one win (the outcome we want). This immediately gives the probability 1 / 100.
You can take this further by using the places different-seeded teams have finished in the tournament to look at each team’s likelihood of winning. In 32 out of the last 34 tournaments, at least one No. 1 seed has made it to the Final Four, giving each No. 1 seed this year a 32/34 (or 16/17) chance of getting there. Additionally, at least one No. 1 seed has made it to the championship game 26/34 times, giving a probability of 13/17. For No. 2 seeds, this reduces to 22/34 (or 11/17) for the Final Four and 13/34 for the championship game. Additionally, a No. 1 seed has won 21/34 times, and the winner has being among the top three seeds 30/34 = 15/17 times.
You can also use these same statistics to think about teams with essentially no chance of winning. Analysis of the tournaments since 1985 shows that no seeds from No. 9 to No. 16 have ever reached the final, so choosing one of these as your winner would probably be a huge mistake.
When it comes to trying to choose a whole bracket, the same statistics show that there is an average of eight upsets each year. This doesn’t help you say where they will be, but if you’ve predicted a lot more or fewer upsets than this, you might want to re-think your choices.
Is This Enough to Pick a Winner?
So a basic analysis looking at probabilities based on seed number can get you pretty far when it comes to predicting what’s going to win March Madness, but is it really enough to make your choice?
It seems pretty obvious that there is more to a basketball game than the team’s rankings or even their previous performance. Other key stats, such as the percentage of successful free throws for a team, their average number of turnovers per game, their field goal success percentage and many other factors.
Coming up with an explicit formula for a win probability based on all of this would be complicated, but this gives you an idea of the sort of thing you’d need to take into account to fill out your bracket as well as possible.
For example, if you have a No. 2 seed team which leads the pack in field goal percentage and have very few turnovers per game, they’re a solid pick as a winner even though an analysis on the basis of seeds alone would suggest they weren’t the ideal choice. The best advice is to base your initial picks on seeds, and then use other statistics to mentally tweak your formula until you settle on a team you’re happy with.