Wednesday, March 20, 2013

Who Will Win The NCAA Tournament?


While statistics can be applied to any field, one aspect of American life relies far more heavily on them any other: Sports. The billion-dollar industries are of course perpetuated through athleticism and the entertainment it provides, but all of that is supported by statistics. Baseball is a prime example- many executives in major league baseball rely on advanced statistics and complicated algorithms to evaluate talent and make personnel decisions. The Oakland Athletics famously used sabermetrics to great effect a little over ten years ago, as analyzed in Moneyball.

Of course, statistics don’t have to be necessarily be that complicated. Any casual fantasy football player follows statistics on a weekly basis- rushing yards, touchdowns- all sorts of valuable statistics are used in fantasy football. In fact, that’s all fantasy football is: lots and lots of statistics. The objective is to just make your numbers come out of the system better than your opponents’.

We can easily use statistics in any sport, but today, we’ll be looking at college basketball. When I started working on this project, I knew that there were two major events I wanted to cover and predict using statistics: The presidential election, and the NCAA men’s basketball tournament. The reasoning behind the election should be obvious, and I chose the NCAA tournament not only because statistics play such a large role in sports, but also because Americans like to predict nothing more than the bracket. When March rolls around, it’s bracket-mania in the United States. There are brackets for literally everything imaginable. So why not fill out my own bracket(s) using a purely statistical model?
_________________________________________________________________________________________

I decided to fill out four brackets this year: Two of them will be based on statistics, and two of them will be my control brackets to gauge the success of my statistical models.

BRACKET #1: Statistics Bracket
For my first bracket I took multiple statistical categories for each team in the tournament and multiplied them by their strength of schedule. Some teams- Duke- simply have tougher schedules than other teams- Southern University. I then compared each team in each category and assigned them scores from 1 to 68 in each category. The sum of these sub-scores would be their composite score, which I would use to make my predictions.

Final Four: Louisville, New Mexico, Kansas, Indiana
Champion: Indiana over Louisville

Notes:
·      Despite composite scores ranging from 118 to 786, there were ties. I broke ties by siding with the lower seed. Upsets happen and are a trademark of the tournament. If a game is so close that it receives a tie from this model, it is one of the most likely ones to have an upset.
·      No seed lower than 10 won their first round matchup. This is good, because my model didn’t output any ridiculous upset winners, but also bad, because those ridiculous upsets will happen sooner or later.

BRACKET #2: Points Bracket
This bracket was based simply on points scored and points allowed per game, adjusted again for strength of schedule. For example, Team A’s score is the average of A’s points scored and B’s points allowed, and Team B’s score is the average of B’s points scored and A’s points allowed. Whichever team has the highest score wins the matchup.

Final Four: Louisville, Gonzaga, Florida, Indiana
Champion: Indiana over Gonzaga

Notes:
·      All of the 6-seeds lost their first round matchups. While this model did pick more upsets, I don’t expect all of the 11-seeds to win… especially since Middle Tennessee, which this bracket had going to the Sweet 16, didn’t even win its play-in game. When St. Mary’s was substituted back into the bracket… nothing changed. St. Mary’s also made it to the Sweet 16. Belmont went even further as an 11-seed, making it all the way to the Elite Eight. 
·      I was a bit surprised to see Indiana as the champion in both of the above brackets. I guess the numbers are with Indiana this year. But will that be enough to get them to the championship?

BRACKET #3: Seeds Bracket
Control bracket 1: The higher seed wins. If my models outperform this bracket, I’ll consider them a success.

BRACKET #4: Mascot Bracket
Control bracket 1: The fiercer mascot wins. (Which one would win in a fight?) I fill out one bracket this way every year, and if a nearly random bracket like this one outperforms my statistical models, then there’s a problem.

I’ll check back in after the tournament to measure my success. Hopefully I’ll be a little closer on these predictions than the one I made for the Pope.

No comments:

Post a Comment