Pre-1910 Batter Strikeout Data
Posted by Neil Paine on April 13, 2011
A B-R user recently wondered about the source for our pre-1910 batter strikeout data (example), given that those stats were not officially kept track of until 1913 in the AL and 1910 in the NL. I posed the question to Pete Palmer, stat legend and season-data provider to Baseball-Reference, and here was his reply:
"The strikeout data came from Jonathan Frankel, who did a tremendous amount of work with a number of helpers checking box scores in various newspapers. He identified about 90% of NL batters and 80% of AL batters from 1897-1909. The results were then prorated for the remainder of the season. Work is continuing on digging up more boxes and also on 1910-12 AL.
I was surprised that Jonathan was able to find so much data. What happened is that the local papers often carried the strikeouts for their games, so it required volunteers all over the country to check the papers, plus some inter-library loans. It was a terrific undertaking."
It turns out that Jonathan has a blog where he posts updates about the progress of his batter strikeout research. He says the 1910 AL is 89% complete right now, and that he has begun work on the 1912 AL as well.
April 13th, 2011 at 1:32 pm
And I was just going to watch TV tonight.
April 13th, 2011 at 1:57 pm
The results were then prorated for the remainder of the season.
I really think this *has* to be clearly noted on the site. Otherwise you are presenting estimated stats as if they were actual. It's similar to the old catcher stats which have SB/CS based on prorated team totals, not actual player totals. It is very misleading.
April 13th, 2011 at 4:48 pm
Not sure how to explain this...3 at bats and 4 strikeouts ?
http://www.baseball-reference.com/players/g/gibsoch01.shtml
April 13th, 2011 at 5:00 pm
Owen, see above. The numbers are being estimated based on incomplete data and it can end up in screwy results like that.
April 13th, 2011 at 5:07 pm
Retrosheet is updated back to 1918 now and they've had the isolated 1911 NL season done for a while now too. It'll be interesting to see what happens when they get back far enough to encounter this issue. I guess the more eyes looking at things the better.
April 13th, 2011 at 5:12 pm
This is a great thing (even with sometimes strange results)...the 1900s strikeout machine Billy Maloney !
http://www.baseball-reference.com/players/m/malonbi01.shtml
April 13th, 2011 at 5:19 pm
[...] Neil Paine talks about where the site’s pre-1910 batter strikeouts came from. Link Posted on Wednesday, April 13th, 2011 at 5:19 pm, Category: Baseball, Tags: bref, data, history, [...]
April 14th, 2011 at 2:59 pm
After going over his blog, I'll hold out as to why Charlie Gibson struck out four times in three at bats. His extrapolations are based on strikeouts/game. The at bat total may well be wrong.