Bloops: A method for determining the probability that a given team was the true best team in some particular year
Posted by Neil Paine on April 25, 2011
In January 2004, Tangotiger posted this at his site:
http://www.tangotiger.net/archives/stud0268.shtml
It linked to a very cool mathematical method for determining the probability that any team was the "true" best team in a season. Unfortunately, though, webpages sometimes have a tendency to disappear over the course of 7 years. That's just what happened here -- the original link is now dead.
However, I contacted the creator of the methodology, Dr. Jesse Frey (Professor of Mathematical Science at Villanova), and he was gracious enough to re-upload the original study to his current site:
http://www19.homepage.villanova.edu/jesse.frey/BestTeam/forprimer.htm
Now sabermetricians can once again estimate the probability of any team truly being baseball's best. And for what it's worth, here are the current results from a very simplified version:
Rank | Tm | Lg | W | L | WPct | Stdev | BayesW% | Stdev | p(best) |
---|---|---|---|---|---|---|---|---|---|
1 | PHI | NL | 15 | 6 | 0.714 | 0.099 | 0.558 | 0.051 | 16.4% |
2 | COL | NL | 14 | 7 | 0.667 | 0.103 | 0.542 | 0.052 | 10.8% |
3 | TEX | AL | 14 | 7 | 0.667 | 0.103 | 0.542 | 0.052 | 9.9% |
4 | NYY | AL | 12 | 6 | 0.667 | 0.111 | 0.538 | 0.053 | 9.1% |
5 | FLA | NL | 13 | 7 | 0.650 | 0.107 | 0.536 | 0.052 | 7.9% |
6 | CLE | AL | 13 | 8 | 0.619 | 0.106 | 0.529 | 0.052 | 6.2% |
7 | LAA | AL | 12 | 10 | 0.545 | 0.106 | 0.511 | 0.052 | 3.5% |
8 | STL | NL | 12 | 10 | 0.545 | 0.106 | 0.511 | 0.052 | 3.2% |
9 | KCR | AL | 12 | 10 | 0.545 | 0.106 | 0.511 | 0.052 | 3.2% |
10 | DET | AL | 12 | 10 | 0.545 | 0.106 | 0.511 | 0.052 | 3.1% |
11 | MIL | NL | 11 | 10 | 0.524 | 0.109 | 0.506 | 0.053 | 2.6% |
12 | LAD | NL | 12 | 11 | 0.522 | 0.104 | 0.505 | 0.052 | 2.4% |
13 | OAK | AL | 11 | 11 | 0.500 | 0.107 | 0.500 | 0.052 | 2.3% |
14 | TBR | AL | 11 | 11 | 0.500 | 0.107 | 0.500 | 0.052 | 2.2% |
15 | WSN | NL | 10 | 10 | 0.500 | 0.112 | 0.500 | 0.053 | 2.1% |
Rank | Tm | Lg | W | L | WPct | Stdev | BayesW% | Stdev | p(best) |
16 | CIN | NL | 11 | 11 | 0.500 | 0.107 | 0.500 | 0.052 | 1.8% |
16 | SFG | NL | 10 | 11 | 0.476 | 0.109 | 0.494 | 0.053 | 1.8% |
18 | BOS | AL | 10 | 11 | 0.476 | 0.109 | 0.494 | 0.053 | 1.8% |
19 | ATL | NL | 11 | 12 | 0.478 | 0.104 | 0.495 | 0.052 | 1.7% |
20 | CHC | NL | 10 | 11 | 0.476 | 0.109 | 0.494 | 0.053 | 1.6% |
21 | TOR | AL | 9 | 12 | 0.429 | 0.108 | 0.483 | 0.052 | 1.0% |
22 | PIT | NL | 9 | 12 | 0.429 | 0.108 | 0.483 | 0.052 | 1.0% |
23 | MIN | AL | 9 | 12 | 0.429 | 0.108 | 0.483 | 0.052 | 0.8% |
24 | NYM | NL | 9 | 13 | 0.409 | 0.105 | 0.478 | 0.052 | 0.8% |
25 | BAL | AL | 8 | 12 | 0.400 | 0.110 | 0.477 | 0.053 | 0.8% |
26 | ARI | NL | 8 | 12 | 0.400 | 0.110 | 0.477 | 0.053 | 0.7% |
27 | HOU | NL | 8 | 14 | 0.364 | 0.103 | 0.465 | 0.052 | 0.5% |
28 | CHW | AL | 8 | 14 | 0.364 | 0.103 | 0.465 | 0.052 | 0.3% |
28 | SDP | NL | 8 | 14 | 0.364 | 0.103 | 0.465 | 0.052 | 0.3% |
30 | SEA | AL | 8 | 15 | 0.348 | 0.099 | 0.459 | 0.051 | 0.3% |
April 25th, 2011 at 2:17 pm
Strength of schedule? Important with such a small sample size here, and with unbalanced schedules.
April 25th, 2011 at 2:21 pm
Right, my simple version didn't take that into account but Dr. Frey's definitely does.
April 25th, 2011 at 3:56 pm
This method conclusively demonstrates that the 11-12 Braves are in actuality not as good as the 10-11 Giants.
April 25th, 2011 at 4:06 pm
I think the odds that Seattle is the best team are the same as the odds of drawing a royal fizzbin: Mr. Spock has never calculated them.
April 25th, 2011 at 4:44 pm
#3 - Ha, well the difference there is actually noise in the Monte Carlo because I didn't run enough iterations. Given enough simulations, the Braves will be ranked higher because their Bayesian rating is higher with a smaller standard deviation. I just wanted to give everyone a feel for what kind of results you can get from a method like this.
In fact, the ideal playoff system would involve running this method and eliminating all teams we're 95% certain aren't the best (since the playoffs should only include teams that have a plausible case for being the best).
April 25th, 2011 at 9:26 pm
Neil,
Very interesting!
The results are only comparable within one year aren't they? For example, Oakland's 54.3% in 1990 can't be compared to Atlanta's 45.2% in 1997?
April 25th, 2011 at 10:24 pm
How many iterations did you run?
April 25th, 2011 at 11:32 pm
For the sake of time I ran 10,000 -- which sounds like a lot, but Dr. Frey's stabilized after 100,000.
April 25th, 2011 at 11:35 pm
#6 - Right, the probabilities aren't really intended to be compared across seasons -- although they do give an indication of how dominant a team's W-L record was relative to the other top teams of that season.
April 26th, 2011 at 7:14 am
#5- so in the ideal playoff system, the 2001 Mariners would have been handed the WS title after the regular season? not to be argumentative, but...
April 26th, 2011 at 8:08 am
I have to side with Tbone here. All depends on what you mean by ideal. It would appear, especially given the playoff formats in other sports, that the ideal for most people is a system where lots of teams have a chance to win the championship (including the team for which they root).
But wouldn't that be a blast if something like that were implemented, where if you didn't make that 95% certainty cut you'd be out of the playoff picture, and the greatest Cubs team ever was calculated as false negative resulting in another century of futility?
If we're going to instill a criteria where we eliminate things of which we're 95% sure that they aren't the best, let's start with the Hall of Fame and not the playoffs.
April 26th, 2011 at 8:29 am
#10 - Well, no. Oakland had a 9.1% chance of being the best according to his 2001 sub-page:
http://www19.homepage.villanova.edu/jesse.frey/BestTeam/post2001.htm
On the main methodology page he listed all teams who either had a 10% chance or won the real-life WS. But the cutoff I mentioned earlier was 5% (and you could change that depending on which significance level you wanted to use).
My point was that the postseason only exists to settle the "best team" question if there is doubt after the regular season. It shouldn't include teams that didn't take advantage of regular-season opportunities to make their case for #1.
That said, it's going to be hard to find a scenario where you would need to hand the WS to a team without playing any playoff games, if a season like 2001 or 1998 can't produce that outcome.
April 26th, 2011 at 8:32 am
#12 - And I realize this statement sounds a lot like something advocates of the BCS would say:
"...the postseason only exists to settle the "best team" question if there is doubt after the regular season. It shouldn't include teams that didn't take advantage of regular-season opportunities to make their case for #1."
The problem with the BCS is the rigidity of a 2-team format that doesn't allow for the possibility of more than 2 teams having a legitimate chance at being #1. If the BCS were to adopt a methodology like this, where the size of the playoff changed yearly depending on the probabilities of each team being the "true" #1, I wouldn't have any problem with it.
April 26th, 2011 at 5:03 pm
Where are the 1969 rankings? Let's lift this albatross from Cubs' neck once and for all, and see if they were -- as all Cubs fans know in their heart -- truly better than the hated Mets.
April 26th, 2011 at 5:21 pm
@14
WIne, 1969 was a looong time ago and the Miracle Mets are everybody's darling. Let's not lets a Bloops calculation get the way of history.
April 27th, 2011 at 12:21 am
@14
Wait -- what?!? The '69 Mets beat the Cubs by 8 games in the division race and by 10-8 in the season series. Then they blazed through the postseason at 7-1, flattening the 109-win Orioles.
Run differential is a wonderful tool. But the fact that the Cubs' pythagorean win total was 1 more than the Mets' is scant support for your claim.
If the Cubs were truly better than the Mets, they probably wouldn't have folded like a Mad Magazine cover in September. Great teams don't go 8-18 with the pennant on the line.
April 27th, 2011 at 3:59 am
They were about the same, but are poor or even great records down the stretch necessarily the result of play under pressure? When statisticians look at individual example of player's clutch play, it almost never exists. All here are likely to know about the vagueries of performance due to random variation & sample size.
Why is it likely to be different for teams? We know that a team or individual with ANY sort of record will have times when they are much better or worse than their average performance, certainly over 162 games. If no team EVER performed (assume a perfect world where we magically knew the cause of everything) better or worse due to pressure, some would through the law of averages have records like 8-18 down the stretch.
That does not even consider the causes of this, some luck, close games (since whether runs are scored efficiently or bunched up is mostly random), nor what the strength of the schedules are. And when things like injuries occur, any team may do badly by normal standards.
April 27th, 2011 at 2:44 pm
@17, Mike Felber -- Yeah, OK, my last point @16 was just gratuitous dig at a long-suffering Cubs fan. (Given the stretch collapses by my Mets in 2007-08, I don't mind focusing on someone else's pain once in a while.)
I don't think there's a universally acknowledged scientific way to objectively compare two teams. For the '69 Mets and Cubs, their pythagorean win totals were almost the same. When I weigh that fact among the others I cited -- actual records, head-to-head records, and the fact that the Mets went 7-1 in the postseason -- I just can't see any reason to think that the Cubs were "truly" a better club. And isn't the burden of proof on the other side?