Wednesday, March 20, 2013

Who Will Win The NCAA Tournament?


While statistics can be applied to any field, one aspect of American life relies far more heavily on them any other: Sports. The billion-dollar industries are of course perpetuated through athleticism and the entertainment it provides, but all of that is supported by statistics. Baseball is a prime example- many executives in major league baseball rely on advanced statistics and complicated algorithms to evaluate talent and make personnel decisions. The Oakland Athletics famously used sabermetrics to great effect a little over ten years ago, as analyzed in Moneyball.

Of course, statistics don’t have to be necessarily be that complicated. Any casual fantasy football player follows statistics on a weekly basis- rushing yards, touchdowns- all sorts of valuable statistics are used in fantasy football. In fact, that’s all fantasy football is: lots and lots of statistics. The objective is to just make your numbers come out of the system better than your opponents’.

We can easily use statistics in any sport, but today, we’ll be looking at college basketball. When I started working on this project, I knew that there were two major events I wanted to cover and predict using statistics: The presidential election, and the NCAA men’s basketball tournament. The reasoning behind the election should be obvious, and I chose the NCAA tournament not only because statistics play such a large role in sports, but also because Americans like to predict nothing more than the bracket. When March rolls around, it’s bracket-mania in the United States. There are brackets for literally everything imaginable. So why not fill out my own bracket(s) using a purely statistical model?
_________________________________________________________________________________________

I decided to fill out four brackets this year: Two of them will be based on statistics, and two of them will be my control brackets to gauge the success of my statistical models.

BRACKET #1: Statistics Bracket
For my first bracket I took multiple statistical categories for each team in the tournament and multiplied them by their strength of schedule. Some teams- Duke- simply have tougher schedules than other teams- Southern University. I then compared each team in each category and assigned them scores from 1 to 68 in each category. The sum of these sub-scores would be their composite score, which I would use to make my predictions.

Final Four: Louisville, New Mexico, Kansas, Indiana
Champion: Indiana over Louisville

Notes:
·      Despite composite scores ranging from 118 to 786, there were ties. I broke ties by siding with the lower seed. Upsets happen and are a trademark of the tournament. If a game is so close that it receives a tie from this model, it is one of the most likely ones to have an upset.
·      No seed lower than 10 won their first round matchup. This is good, because my model didn’t output any ridiculous upset winners, but also bad, because those ridiculous upsets will happen sooner or later.

BRACKET #2: Points Bracket
This bracket was based simply on points scored and points allowed per game, adjusted again for strength of schedule. For example, Team A’s score is the average of A’s points scored and B’s points allowed, and Team B’s score is the average of B’s points scored and A’s points allowed. Whichever team has the highest score wins the matchup.

Final Four: Louisville, Gonzaga, Florida, Indiana
Champion: Indiana over Gonzaga

Notes:
·      All of the 6-seeds lost their first round matchups. While this model did pick more upsets, I don’t expect all of the 11-seeds to win… especially since Middle Tennessee, which this bracket had going to the Sweet 16, didn’t even win its play-in game. When St. Mary’s was substituted back into the bracket… nothing changed. St. Mary’s also made it to the Sweet 16. Belmont went even further as an 11-seed, making it all the way to the Elite Eight. 
·      I was a bit surprised to see Indiana as the champion in both of the above brackets. I guess the numbers are with Indiana this year. But will that be enough to get them to the championship?

BRACKET #3: Seeds Bracket
Control bracket 1: The higher seed wins. If my models outperform this bracket, I’ll consider them a success.

BRACKET #4: Mascot Bracket
Control bracket 1: The fiercer mascot wins. (Which one would win in a fight?) I fill out one bracket this way every year, and if a nearly random bracket like this one outperforms my statistical models, then there’s a problem.

I’ll check back in after the tournament to measure my success. Hopefully I’ll be a little closer on these predictions than the one I made for the Pope.

Tuesday, March 12, 2013

Who will be the next Pope?


For the next few (days/weeks/hopefully not months) the eyes of the world- especially the Catholic world- will be on the Vatican City as the conclave of Cardinals decides who will be the next Pope; the next leader of the Catholic Church. Unfortunately, I know next to nothing about the Catholic Church, or the process of choosing a Pope, or almost anything else. Luckily, you have had to try incredibly hard to go somewhere on the Internet without learning about the Pope recently, so I’ve been able to pick up a few things.

In simplest terms, you must meet two requirements to become Pope:
·      Be Catholic
·      Be a man
Unfortunately, since there are about 1 billion Catholics in the world, and that means there’s probably about 500 million people who could become Pope if we only used these two requirements. That’s a lot. You can add another requirement on the list to narrow down the field, which is
·      Be a cardinal
That takes out many of those 500 million possibilities, and a few more when we take out the St. Louis Cardinals. The cardinals (not those cardinals) elect the next pope, and generally they elect someone from amongst themselves. Since this trend has been ongoing for the last 600 years, I doubt this will change any time soon.

Of course, becoming a cardinal has its own special set of requirements and unless you’re already one, chances are you won’t become one before the next Pope is chosen. Sorry. 

The cardinals are probably looking for certain qualities when they elect the next Pope. Most of these are qualitative, and since I, again, know nothing about the Catholic Church, I can’t begin to make an accurate prediction of the next Pope based on those. However, I can at least take a shot at some of the quantitative aspects of Popes; specifically, age and birthplace.

Only cardinals under 80 years old are invited to the conclave, so age can be considered an important factor. You don’t want a Pope that is too old, because you want continuity in the position. But every cardinal right now is over 50 (as it just so happens), so there’s a fairly small range of ages for possible Pope-elects. Surprising (or not), the average age for a new Pope since 1700 is right in the middle of that range: 65.

Birthplace probably also plays a role in the election of Popes, even though most cardinals would probably deny it. 19 of the 23 Popes since 1700 have been born in Italy; the remaining Popes were born in countries fairly close to Italy: Germany, Poland, and Austria (2).

Thus, without further ado, I present

THE POPE CALCULATOR
What are your chances of being Pope?

You can calculate the odds that any one person would have of becoming Pope based on their age and birthplace. Simply input the values into this equation:





Where     d = distance of birthplace from Rome, in miles
And       a = age, in years

The closer a P value is to 1, the greater the chance is of a given person being elected Pope.

What does this equation tell us about who is most likely to be Pope? I didn’t have time to apply the equation to all potential cardinals and I certainly don’t have time to apply it to every Catholic male, but I was able to identify some cardinals on my own that fit the parameters of the equation well and CNN also had a handy list of twelve potential popes. Let’s apply my equation to those twelve popes first:



Of this group, Christoph Schonborn, the Archbishop of Vienna, has the best chance of becoming Pope. I wouldn’t find this terribly surprising given that CNN probably understands the papal contenders better than I do. Peter Turkson, who is from Ghana and would be the first modern African Pope, is a close second behind Schonborn.

However, I also encountered ten more cardinals with larger values of P. Seven of these had a value of P > 0.9:



Giuseppe Betori, Archbishop of Florence, is the clear leading candidate to become Pope, at least from a statistical perspective. However, I’m sure the members of the papal conclave will weigh qualitative values much higher than quantitative ones, so I don’t expect my predictions here to be correct, but having a good score through my equation certainly wouldn’t hurt. Regardless, all eyes will be on Rome in the coming days as we anticipate the election of the next Pope.