Bloops: Book of Odds
Posted by Neil Paine on December 8, 2009
Slightly off-topic, but mostly on, here's a site I think many of our readers (math-oriented and numerophobic alike) will be interested in: it's called the Book of Odds, and it "explores the world through the lense of probability, writing articles and publishing statistical information." Of particular note for our purposes is the series of baseball articles they rolled out for this past World Series:
Behind the Numbers: 2009, a Baseball Season of Oddities
World Series Wishes (Exploring the odds a fan has seen their team win a World Series -- Sorry, Cubs fans)
Odds Are a Baseball Player Is Superstitious
David Gassko, who many of you may know from his work at The Hardball Times, writes for the site as well, so they do have some baseball people over there. Anyway, give it a look -- like I said, it's not just for people who like math, but instead I think it's aimed at getting everyone to start thinking about the world in a more probabilistic way.
December 8th, 2009 at 1:24 pm
I guess this is as good a place as any to risk the wrath of Joe DiMaggio fans everywhere by asserting that his record 56-game hitting streak -- while certainly extremely unlikely ever to be broken -- was more of a statistical anomaly than a 'splendid' achievement.
The math is roughly as follows:
Assumptions:
- MLB's top hitters for average in a given season ~ .350;
- Hitters average four at-bats per game
Calculations:
- A .350 hitter who gets four at-bats per game has a greater than 82% chance of getting one or more hits in each game;
- The inverse of this is that a .350 hitter who gets four at-bats per game has a less than 18% chance of going hitless in a game;
Therefore:
- While a .350 hitter has only a 0.0094% chance of getting at least one hit in 56 consecutive games;
- A .350 hitter's chance of going hitless in only six consecutive games is a mere 0.0032%;
- Further, a .350 hitter's chance of getting a hit in nine consecutive at-bats is less than 0.0079%.
While all three events obviously are extremely unlikely to occur, my guess is that most fans would consider either of the latter two achievements as not only less unlikely, but also more quirky; i.e., more a statistical anomaly than an unbreakable achievement. I would be among them.
On a far more subjective note, I would also guess that if it were, say, Harry Heilmann or Al Simmons -- both of whom had many seasons with higher BAs than Dimaggio's .357 in '41 -- who held this record for so long, it would be considered less monumental than it is now...probably more akin to George Sisler's 257 hits in a season, since broken by Ichiro Suzuki.
I hope this is seen more as fun than sacrilege.
December 8th, 2009 at 3:06 pm
I liked the Wishes column, but it would be nice if they biased their answers on historical numbers of fans. For example, they say 1 in 4.3 Indians fans were alive to see the game in 1948, but how many of those people were Indians fans back then? Does it really count if a 61 year old Indians fan was "alive" when they won if he probably didn't even see it or remember it happening? Adding another layer, if you were to break down the Indians demographics you'd find that a lot more of their fans are recent fans. As a general rule, teams have more fans when they're good, and the Indians were so bad from the 60's-80's that they lost a lot of fans, then picked up a bunch when they got good in the 90's. So a majority of the fans were probably born no earlier than 1980, which means they were 35 years away from seeing that championship. In other words, I would expect that if you could somehow interview every single Indians fan the number would be a lot worse than 1 in 4.3.
The same is probably true for a lot of these teams, I just picked the Indians as an example because it's what I know.
December 8th, 2009 at 3:31 pm
JDV, here's an interesting article the New York Times ran in 2008: http://www.nytimes.com/2008/03/30/opinion/30strogatz.html?_r=1
It states that Joe DiMaggio's record was thoroughly average in a series of 10,000 simulations of baseball history. I've got nothing against Joltin' Joe, and he had a great career and a fine season. As a major Ted Williams fan, though, I'm understandably annoyed that his accomplishment cost Teddy Ballgame the MVP award he richly deserved!
December 8th, 2009 at 3:54 pm
BLT...thanks for the link. That was very interesting.
December 8th, 2009 at 3:58 pm
JDV, I don't think I agree with your math (or at least your arithmetic).
Using your assumptions, I agree that a .350 hitter has about an 82% chance of getting a hit in any given game. 1 - (.65 ^ 4) = .8215. However, I don't see how you came up with: "- While a .350 hitter has only a 0.0094% chance of getting at least one hit in 56 consecutive games;"
Wouldn't P (56 consecutive games with a hit) be equal to .8215 ^ 56, or .0000165?
If so, then it is much less likely to occur than either of the two feats you named. Or, if you disagree with my calculation, let me know why.
December 8th, 2009 at 4:38 pm
I've always loved this particular stat and analysis, as well as the observation that Pete Rose's 44-game streak was statistically more unlikely. Per Jksesq1's post above, I also have DiMaggio at .0000165, but Rose at .000007175 through 44 games, with Rose's feat becoming more unlikely at the 42nd game. I used Rose's lifetime average of .303 while noting he hit .302 the year of the streak (1978). DiMaggio .325/.357 lifetime/1941 respectively. The 4 averages for 56 games each yield:
.303 .000000283725
.325 .000002193158
.350 .000016513486
.357 .000027593809
Naturally the odds change a bit given the number of at-bats actually available in each game, and IIRC Rose had 5 the night his streak ended, but these figures give you something for camparison.
Last but not least, do I remember correctly that the game after the one the broke DiMaggio's streak marked the beginning of a separate 19 game streak? 😉
December 8th, 2009 at 5:59 pm
Balburgh, I think you're right. Also I believe he had a 70-odd game streak with SF in the PCL which is on of the longest streaks in MINOR league history. Safe to say, Joe D would definitely qualify as a "streak hitter" 🙂
Speaking of Joe DiMaggio, does anyone know why it took him 3 ballots to get into the Hall of Fame? Was it his general demeanor, his frosty relationship with the press (not that those kept Ted Williams or He finished 9th (!) in his first year of eligibility, and was behind Rabbit Frickin Maranville for each of the first two years. What was the voters fascination with good-but-not-great players like Maranville and Dazzy Vance in the early 50's, and why would anyone think they were remotely as deserving of HOF consideration as DiMaggio?
December 9th, 2009 at 12:05 am
I think there was something of a "wait your turn" mentality among the writers in the late 1940s, early 1950s - let's first vote in all the deserving guys from the 1920s and 1930s, and when we're done with that we can have a look at more recent players.
Technically, it took 4 ballots for Joe D to get in - he got a vote in 1945. Eligibility requirements hadn't been formalized yet.
Concerning the number of Indians fans, it may be relevant that Cleveland's population is much smaller now than it was in the 1940s and 1950s. It's true that, like most teams, they have more fans (and higher attendance) when they're good than when they're bad, but you also have to take into account the changing population, and the rise in attendance all over major league baseball over the last 20 years or so.
December 9th, 2009 at 1:41 pm
jksesq1 / BalBurgh,
You're both right. I had run the numbers with several different BAs, and I inadvertently put the odds for a .375 hitter in that part of my post. That obviously skewed my comparisons as well -- one of them in a direction to support my assertion, but the other in a direction to weaken it. My "therefore" should have looked like this:
- While a .375 hitter has only a 0.0094% chance of getting at least one hit in 56 consecutive games;
- A .375 hitter's chance of going hitless in only FIVE consecutive games is a mere 0.0083%;
- Further, a .375 hitter's chance of getting a hit in TEN consecutive at-bats is less than 0.0055%.
Thanks for the correction. Oh...for those who saw huge differences at a glance, my figures are percentages rather than decimals.
December 9th, 2009 at 2:01 pm
Then a friend asked what the correct equivalent comparisons at .350 are:
- While a .350 hitter has only a 0.0017% chance of getting at least one hit in 56 consecutive games;
- A .350 hitter's chance of going hitless in only 6.5 consecutive games (26 at-bats) is a mere 0.0014%;
- Further, a .350 hitter's chance of getting a hit in eleven consecutive at-bats is less than 0.0010%
...after which he pointed out that both of these new numbers weakened my assertion. Uh...yup...but I'll still call it an anomaly.
December 9th, 2009 at 5:28 pm
Gerry is right -- there was a de facto 5-year waiting period, but it was not codified, and obviously not all voters observed it. I believe DiMaggio actually got elected quicker (three years after retiring) than anyone except Clemente and Gehrig, yet for some reason fans always repeat how he was "snubbed" a couple times.
I recall that Steven Jay Gould calculated that DiMaggio's streak was actually _not_ within the bounds of statistical likelihood. I don't remember what numbers he used to conclude that.
I have always thought the streak (and most streaks) was overrated. Though it's exceedingly unlikely, one could get one hit a day for 60 games in a row. They've set a new record but were not particularly good hitters over that period. Plus there are all the stories about questionable scoring decisions which kept the streak going. Anything so reliant on simple accounting decisions loses a lot of luster in my book. It's an interesting feat but to me not as remarkable as records like Cy Young's wins or something.
December 9th, 2009 at 6:00 pm
Good point about how quickly DiMaggio was elected. I think you have to add Ruth to the list of those elected quicker than DiMaggio. Also Stengel, if you include managers, and Connie Mack was inducted more than a decade before he retired.
December 9th, 2009 at 6:47 pm
I believe that Gould concluded that DiMaggio's streak was the greatest statistical streak in sports, the only one beyond probability.
December 9th, 2009 at 7:07 pm
I remember reading several years ago (but already in the 2000s or oughts or whatever you call this decade) in a book published around 1975 that, in the opinion of the author (Peter with a last name I have forgotten how to spell who has written several team history books) or someone the author was quoting that there were three records that would never get broken:
1. Cy Young's 511 career wins
2. Joe DiMaggio's 56-game hitting streak
3. Lou Gehrig's 2,130 consecutive games streak
One down and two to go.
December 12th, 2009 at 4:47 am
511 is a pretty safe number for wins ... at least until training gets so good that guys can play into their 80s.