Category Archives: playoffPredictor.com

Elo predictions for college football – base and divisor

Ever thought about chess ratings? The highest chess rating ever for a human is held by Magnus Carlsen at 2882. The lowest is shared by many people at 100. The United States Chess Federation initially aimed for an average club player to have a rating of 1500. There is no theoretical highest or lowest possible Elo rating (the best computers are at ~3500 and the negative ratings are theoretically possible, but those people will get kicked from tournaments, so they arbitrarily set 100 as a lowest possible rating).

This particular range of numbers from 100 – 3500 is a consequence of the base and divisor that Arpad Elo chose. The Elo rating system for chess uses a base of 10 and a divisor of 400. Why? According to wikipedia the 400 was chosen because Elo wanted a 200 rating point difference to mean that the stronger player would win about 75% of the time, and people assume that he used 10 as a base because we live in a base 10 world. Interestingly if he would have used base 9 then 200 points would exactly be a 75% chance of win, where base 10 is more like 76% :

1 /(1+9(200/400)) = .25

1 /(1+10(200/400)) ≈ .2402

But enough on Chess. I like the playoffPredictor mathematical formula that puts the teams between roughly 0 and 1 instead of 100 and 3500. But what base and divisor do I use with that system?

Last year I went with base equal to the week number (1-17) and a divisor of .4. The reason for the week number is because bases 2-3-4 don’t exponent out to crazy scenarios, so in week 2 when there is very little data it does not make such drastic judgements. The divisor I picked empirically, it seemed to fit the data and it was also a callout to Elo’s 400.

Really though I have been doing some fiddling and I think a base of 1000 and a divisor of 1 will work better. Here are some spreadsheet results.

Elo probabilities for different bases with constant divisor of 1

To read the above table, look at the row for 1000. This reads so if a team is better by .2 (like computer ratings of .9 and .7 for the 2 teams) then the .9 team as a 79.9% chance of winning. I need to model these against real life, but I feel this is a start.

Further for choosing a base and divisor the following are all equivalent pairs:

equivalent pairs for bases/divisors

So 1000 and 1 give the same result as 10000 and 1.3333, or 100 and .6666. In the same way 1000 and 3 give the same result as 100 and 2, or 10 and 1. Log math.

So I’m going to move to a base of 1000 and a divisor of 1. The divisor of 1 make sense — take that out of the equation. Then, what base should you use? Empirically 1000 seems to fit well. I need to backtest with data from week 13-14 when ratings are pretty established to see how the percentages mesh with Vegas. To do.

At a base of 1000 and divisor of 1, a team with a rating +.16 more than an opponent will have a 75% chance of winning. So in this sense +.16 corresponds to 200 points in chess Elo.

In my rating the teams will be normally distributed with a mean at 0.5 and a standard deviation around 0.25. Meaning teams that are separated by 1 standard deviation, the better team has a 85% chance at success. For example, the following teams are all about 1 sigma apart:

  • #1 Georgia (~1)
  • #20 BYU (~.75)
  • #70 Illinois (~.5)
  • #108 Tulane (~.25)
  • #129 1AA (FCS) (~0)

So Georgia has a 85% chance of beating BYU, BYU has a 85% chance of beating Illinois, Illinois has an 85% chance of beating Tulane, and Tulane has a 85% chance of beating a FCS school. Is that right? Not sure

Keeping with the logic, Georgia would have a 97% chance of beating Illinois [1/(1+1000^(-.5))]. BYU would have a 97% chance of beating Tulane. Are those right? I think so. Would need to check against Vegas.

tldr; PlayoffPredictor.com used to use week number and .4 for the base and divisor, but now uses 1000 for the base and 1 for the divisor.

Final weekend – CFP chances

Heading into Saturday morning here are the playoff chances for each team:

Georgia – 100% – lock
Alabama – 83% – In with a victory vs Georgia, or any Michigan, Cincinnati, or OK State loss
Cincinnati – 70% – In with a win over Houston
Michigan – 65% – In with a win over Iowa
Oklahoma State – 51% – In with win over Baylor EXCEPT if Alabama, Michigan, Cincinnati all win, or in with Georgia, Michigan, and Cincy all lose.
Ohio State – 17% – In with Georgia & Michigan win, Cincy & OK State loss, OR Georgia win, Michigan & Cincy loss, OR Georgia & Cincy & OK State loss, OR Georgia, Michigan, Cincy and OK State all lose.
Baylor – 13% – In with: Baylor beats OK State, Iowa beats Michigan


College Football week 11 probabilities

I have been messing around with playoffpredictor.com. Still have a long way to go, but I thought I would look at the data for this weeks games and see how it compares to betting available at Draftkings money line. I am looking to exploit situations where the moneyline payout is misplaced compared to the predicted winning probabilities from playoffpredictor.com.

When the expected value on a moneyline bet is greater than 100%, I want to bet that game/team. In this weeks top 11 games, there are 6 games that the computer believes you can make a bet and get an expected payout in excess of 100% of the bet.

The most appealing, from the computer standpoint, is taking Purdue to beat Ohio State, with an expected payout of $2.89 on a $1.00 bet. This intuitively makes sense as Purdue is ranked #19 by the computer (and also #19 by the committee), and #19 beating #4, especially when played in #19 stadium, is very reasonably possible. Probable? no, but the payout at +750 is a huge incentive to bet on Purdue.

The least appealing bet by the computer is Penn State over Michigan, with an expected payout of 57 cents on a $1.00 bet.

Georgia vs Tennessee

Georgia (-1250)
Total return on $1 bet if bet is successful (Georgia wins) = $1 * 1350/1250 = $1.08
P(Georgia wins) = 91%
Expected return of $1 bet on Georgia = $1.08 * 91% = $0.98

Tennessee (+750)
Total return on $1 bet if Tennessee wins = $1 * 850/100 = $8.50
P(Tennessee wins) = 9%
Expected payoff of $1 bet on Tennessee = $8.50 * 9% = $0.765 

Georgia – Tennesse is what is expected — each bet is expected to have a negative return — the house wins both ways.

But sometimes the computer spots something it likes

New Mexico State vs Alabama

No moneyline offered on New Mexico State vs Alabama

Cincinnati vs South Florida

Cincinnati = $1.01
South Florida = $0.36

Michigan vs Penn State

moneyline odds (DraftKings)unbiased probabilities (playoffpredictor.com)Total expected return on $1
Georgia-12500.910.98
Tennessee+7500.090.77
Alabama0.98
New Mexico State0.02
Oregon-6300.780.90
Washington State+4500.221.21
Ohio State-12500.660.71
Purdue+7500.342.89
Cincinnati-22000.971.01
South Florida+11000.030.36
Michigan-1150.711.33
Penn State-1050.290.57

Oklahoma vs Baylor

Mississippi State vs Auburn

Northwestern vs Wisconsin

Utah vs Arizona

Purdue vs Ohio State

Minnesota vs Iowa

Southern Miss vs UTSA

Maryland vs Michigan State

Texas A&M vs Ole Miss

Notre Dame vs Virginia

NC State vs Wake Forest

Arkansas vs LSU

TCU vs Oklahoma State

Washington State vs Oregon

Nevada vs San Diego State

Results will be posted next week!

First cfp prediction of 2019 – Its Georgia, not Alabama

Today is a big day in the life of my college football playoff predictor site (playoffpredictor.com). Today is the second CFP committee ranking for the 2019 season, which means it is the first prediction week for the computer model.
what is in store for tonight? According to the model we will have Ohio State, LSU, and Clemson in three of the four spots. No surprises there. One surprise that the playoffpredictor says that differs with the AP committee poll – Georgia, not Alabama is in the fourth slot.

Personally, I think they will put Oregon in that slot – I think they will consider a last-second lost to Auburn on a neutral field much superior to a loss to South Carolina on Georgia’s home field. The problem with the first Prediction of the season is that there is not much bias information, those biases tend to smooth out as the season goes on.

If I were a voting member of the committee I would advocate for exactly what the computer says, which is Minnesota and Wisconsin in spots three and four. No Clemson, no Alabama, no Georgia. Minnesota is obviously unbeaten, but I just don’t see the committee changing on a dime from Voting them 17 to voting them number three. I hope it happens, but I’m not holding my breath. As far as Wisconsin? Well they were destroyed by Ohio State, but Ohio State looks fantastic. Other than that just a one point loss to a decent Illinois team. That certainly just as good or better than anybody else’s one loss who has some quality wins to go along with it. Alabama has nothing in the terms of quality wins. Their best win is Texas A&M, followed by Tennessee, southern Mississippi, and Duke. Yes, Alabama’s second best win was to a team that also lost to an FCS level team this year at home. Ouch.

Stay tune for 7 PM tonight, when we see if the first prediction is 75% correct or 100% correct.

 

Habitually Over-rated! clap, clap, clap clap clap!

5 years of college football data are in the books and I have enough data now to look at the playoffPredictor biases and make some determinations about habitually overrated and underrated teams that the playoff committee loves or snubs.

A little primer if you need it — each week of the college football season the computer assigns a rating and ranking to each top 25 team. During weeks 9-15 the playoff committee also assigns each team a ranking. Each week we can compare the committee rankings to the computer rankings and make an objective determination about over-ranking or under-ranking.

Using final season average rating biases, here is what we have after 5 years.

Conclusion? The perennial over-ranked team are also the teams that most often make the college football playoff. 3 out of 5 years for Alabama, Clemson, Washington and Baylor. 4 out of 5 years for Oklahoma and Mississippi State.

Interestingly, Ohio State (the only team besides Alabama, Clemson and Oklahoma to make multiple playoff appearances) has zero seasons over-rated or under-rated by the committee.

But there you have it. Conculsive proof that the rich in college football get richer, not because they are better, but because us humans are biased to think of the bluebloods as better.

 

 

 

 

 

 

 

 

Changes to playoffPredictor.com formula for 2018

I am changing the computer rankings formula on playoffPredictor.com to reflect margin of victory starting with 2018.  This is big change to the core beliefs of the playoffPredictor.com model which have always been based on simplicity.  To this point the model only considered wins and losses with no regard to margin of victory, away/home/neutral site for game, offensive or defensive stats, or month when game was played. A model that is this simple, this mathematical, and has excellent correlation to the final AP rankings year after year should not be tinkered with lightly.

By making this change to include strength of schedule I am hoping to accomplish 2 things:

First, this change should make early season rankings more in line with human polls starting from about weeks 3-4. Currently since margin of victory does not matter the formula can not really distinguish between a 3-0 Baylor team and 3-0 Alabama team.  It is only later in the season when there is more connectedness between Baylor’s and Alabama’s opponents or opponents opponents that the model can see Alabama’s wins to be superior to Baylor’s.   Now, with margin of victory the model will be able to reward a 60-0 Alabama win vs an average Vanderbilt team earlier in the season.

The 2nd goal deals with Auburn and the final 2017 committee prediction.  After 3 very successful years of nailing the playoff committee rankings before they came out, last year was a bust for the playoffPredictor methodology when it came to Ohio State / Alabama and the final rankings. The model put Ohio State at #4 in the final rankings, when the playoff committee had them at #5.  So what happened?   A lot of it has to do with Auburn.  Even after Auburn lost to Georgia in the SEC championship game, the computer did not punish Auburn much.  Going into the game the computer had them at #11 and after the game the computer had them at #12. So they only dropped one spot in the eyes of the computer.  But the humans dropped them from #2 pre-game to #7 post game. Because the formula uses this week’s computer rankings plus last week’s average bias, Auburn’s bias was so high (9 spots between computer at #11 and committee at #2) that when the computer only dropped them from #11 to #12, it expected the committee would similarly drop them from #2 to about #3 — what happened is that the computer was right before the committee saw it.

Let’s take a closer look — here are the week 13 computer and human rankings for 2017. Week 13 is post Auburn-Alabama game (where Auburn beat Alabama) but pre SEC championship game.  Note under the old formula (which does not take in margin of victory) Auburn is #11 in the computer.  and #2 in the humans.

Now here are the week 14 computer and human rankings. Week 14 is post SEC championship game, where Georgia solidly beat Auburn by 21 points. Again, under the old formula Auburn has moved from #11 only to #12 in the computer, and moved from #2 to #7 in the humans.

Clearly Auburn did not deserve to move from #11 in the computer to say #20 just because they lost to Georgia. Yes, they had 3 losses, but the losses were to Clemson (the #1 team in the final estimation of the committee), Georgia (played for the national championship) and LSU (average team), balanced with wins against Georgia and Alabama, who both played for the national championship.  Clearly that is a team resume that should have been right where the computer said (around 10) and not around 20.  So there is no fault in the computer here — it is the fault of the committee for not seeing what the computer saw earlier.

 

Now let’s look at how 2017 would have played out if margin of victory was part of the computer formula all along. At week 13 Auburn is #4 in the computer. Of course they will still be #2 in the humans — so their bias will be a lot lower – only a 2 spot bias.

At week 14 with the new formula, Auburn moves to #11 in the computer.  That coupled with the more normal team bias would have put them squarely out of the final top 4 in the models calculus, accomplishing the stated goals.

 

The other goal that adding strength of schedule will accomplish is get a more accurate computer ranking earlier in the season.  Back to 2017, here is the old model computer rankings for week 4

and here is what it would have been with the new margin of victory components included:

and finally here is what the AP poll was at that time:

 

Note the details like Wisconsin is #7 in the new method, outside of the top 15 in the old.   Alabama is at #3 instead of #5. Mathematically looking at the top 10 in all 3 lists,  the average delta of old to AP is 5.0 and the average delta of new to AP is 4.1, indicating about a 25% improvement in computer to human by week 4.   The correlation of the top 15 improves from .65 to .67.



Now, the method how I am incorporating strength of schedule is: 1 win is given for games where the final margin of victory is 16 or less points, 2 wins given for 17-32 points, and 3 wins given for a margin of victory 33 points or more.  I don’t like this, but it is a crude way to start this process and get the desired effect.  I feel there is a differentiation between a team down 16 and down 17.  At 16 points down, even late in the 4th quarter, that’s just a two score game.  Anything is possible in one play, so even if the offense has the ball and a 16 point lead, a pick six followed by a two point conversion makes a compelling game, and that is always one play away.  However, at 17 points (3 scores) down, I feel the other team will tend to give up a little bit more — you have really beaten a team when you are wining by 17 points with just 5 minutes left to play and you control the ball.   The ideal formula will take all these into consideration — If I have a 1st down, I am up by 9 points, the other team has no timeouts, and there is 3 minutes on the clock — that should all come into play.  I may use ESPNs in-game probabilities as the margin of victory component (when ESPN says team A has a 99.9% chance of winning, call the game then, and if that happens at 45:00 minutes of game time vs 59:40 minutes of game time — that is how the team earns margin of victory — but I may wait till next year to implement that.  I’m all for suggestions! Drop me a line — neville@agafamily.com or at reddit under /r/cfbanalysis

 

 

 

 

 

 

 

College Football thoughts on the eve of the 1st rankings

playoffpredictor.com has as the computer top 4:

Screen Shot 2017-10-29 at 8.29.41 AM

Which really makes sense, if you think about it.  Georgia has a super-quality win over ND. Bama’s best win is over #38 Texas A&M.   Wisconsin looks solid and Penn State’s one loss (1 point on the road) is a much better loss than tOSUs (15 points at home) or OUs (7 points at home).

I am expecting the committee to come out with:

  • Bama #1
  • Georgia #2
  • Wisconsin #3
  • Ohio State or ND  #4

Right now there are 2 other unbeatens – UCF and Miami.   Miami and Notre Dame play each other on Nov 11.  What I am really interested to see is how the committee treats UCF.  My computer has them at #5 – they have some very good wins — in fact, their best win is much better than Bama or Wisconsin’s best win.   However, I suspect the committee will put them at about #20 in the initial poll. The way this season is shaking out UCF could be the only unbeaten in college football.  I will really like to see if a rematch with Memphis and them winning the AAC would be enough to get a mid-major in.

 

 

 

 

NCAA bracket simulator

Need to fill out a bracket? Use my NCAA bracket simulator! Building on last year’s attempt to come up with a perfect bracket and win Warren Buffet’s $1B (by the way, he didn’t offer the prize again this year), I am launching http://ncaa_bracket_simulator.agafamily.com    The underlying idea is to give each team a number of virtual “Ping-Pong balls” based on your expectation of how well they will play. Give Kentucky 100 balls and the #16 seed 1 ball – that means 100 out of each 101 times Kentucky will win that contest and 1 time in 101 Hampton will pull the upset.

Say on the other side of the bracket you give Cincinnati and Purdue 50 balls each- that means the Cincinnati / Purdue game is a 50/50 proposition, and the winner will beat Kentucky 50 out of 150 (50+100), or 1 out of every 3 times. Think that’s too much? Then give Kentucky a higher rating or drop Cincinnati down.

 

The computer will use the ratings you give and randomly simulate an entire tournament for you.   Fancy graphics? Well, no. Not yet. Give it a try for your bracket:   ncaa_bracket_simulator.agafamily.com

 

playoffPredictor.com Nailed It!

As if ever there was a doubt…

Couldn’t have got it more right, and got it right the instant the games went final.

In fact, the playoffPredictor.com site got every single top 4 for the entire season (with the exception of the initial poll, of course — that is needed to build the bias file between the computer ranking and the committee ranking).

 

playoffPredictor.com correctly predicted a lot of things the committe did this year, including:

  • TCU over Alabama in the October poll
  • Ohio State over TCU and Baylor in the final poll

Score 1 for big data and analytics!

Looking forward to keeping the site going next year.  I’ll probably also do some other stuff now that the math is coded — such as comparing all teams in the BCS era. Stay tuned!