While statistics can be applied to any field, one aspect of
American life relies far more heavily on them any other: Sports. The
billion-dollar industries are of course perpetuated through athleticism and the
entertainment it provides, but all of that is supported by statistics. Baseball
is a prime example- many executives in major league baseball rely on advanced
statistics and complicated algorithms to evaluate talent and make personnel
decisions. The Oakland Athletics famously used sabermetrics to great effect a
little over ten years ago, as analyzed in Moneyball.
Of course, statistics don’t have to be necessarily be that
complicated. Any casual fantasy football player follows statistics on a weekly
basis- rushing yards, touchdowns- all sorts of valuable statistics are used in
fantasy football. In fact, that’s all fantasy football is: lots and lots of
statistics. The objective is to just make your numbers come out of the system
better than your opponents’.
We can easily use statistics in any sport, but today, we’ll
be looking at college basketball. When I started working on this project, I
knew that there were two major events I wanted to cover and predict using
statistics: The presidential election, and the NCAA men’s basketball
tournament. The reasoning behind the election should be obvious, and I chose the NCAA tournament not only because
statistics play such a large role in sports, but also because Americans like to
predict nothing more than the bracket. When March rolls around, it’s
bracket-mania in the United States. There are brackets for literally everything
imaginable. So why not fill out my own bracket(s) using a purely statistical
model?
_________________________________________________________________________________________
I decided to fill out four brackets this year: Two of them
will be based on statistics, and two of them will be my control brackets to
gauge the success of my statistical models.
BRACKET #1:
Statistics Bracket
For my first bracket I took multiple statistical categories
for each team in the tournament and multiplied them by their strength of
schedule. Some teams- Duke- simply have tougher schedules than other teams-
Southern University. I then compared each team in each category and assigned
them scores from 1 to 68 in each category. The sum of these sub-scores would be
their composite score, which I would use to make my predictions.
Final Four:
Louisville, New Mexico, Kansas, Indiana
Champion: Indiana over
Louisville
Notes:
·
Despite composite scores ranging from 118 to
786, there were ties. I broke ties by siding with the lower seed. Upsets happen
and are a trademark of the tournament. If a game is so close that it receives a
tie from this model, it is one of the most likely ones to have an upset.
·
No seed lower than 10 won their first round
matchup. This is good, because my model didn’t output any ridiculous upset
winners, but also bad, because those ridiculous upsets will happen sooner or
later.
BRACKET #2: Points
Bracket
This bracket was based simply on points scored and points
allowed per game, adjusted again for strength of schedule. For example, Team A’s
score is the average of A’s points scored and B’s points allowed, and Team B’s
score is the average of B’s points scored and A’s points allowed. Whichever
team has the highest score wins the matchup.
Final Four:
Louisville, Gonzaga, Florida, Indiana
Champion: Indiana over
Gonzaga
Notes:
·
All of the 6-seeds lost their first round
matchups. While this model did pick more upsets, I don’t expect all of the
11-seeds to win… especially since Middle Tennessee, which this bracket had
going to the Sweet 16, didn’t even win its play-in game. When St. Mary’s was
substituted back into the bracket… nothing changed. St. Mary’s also made it to
the Sweet 16. Belmont went even further as an 11-seed, making it all the way to
the Elite Eight.
· I was a bit surprised to see Indiana as the champion in both of the above brackets. I guess the numbers are with Indiana this year. But will that be enough to get them to the championship?
BRACKET #3: Seeds
Bracket
Control bracket 1: The higher seed wins. If my models
outperform this bracket, I’ll consider them a success.
BRACKET #4: Mascot
Bracket
Control bracket 1: The fiercer mascot wins. (Which one would
win in a fight?) I fill out one bracket this way every year, and if a nearly
random bracket like this one outperforms my statistical models, then there’s a
problem.
I’ll check back in after the tournament to measure my
success. Hopefully I’ll be a little closer on these predictions than the one I
made for the Pope.