This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Pre-1910 Batter Strikeout Data

Posted by Neil Paine on April 13, 2011

A B-R user recently wondered about the source for our pre-1910 batter strikeout data (example), given that those stats were not officially kept track of until 1913 in the AL and 1910 in the NL. I posed the question to Pete Palmer, stat legend and season-data provider to Baseball-Reference, and here was his reply:

"The strikeout data came from Jonathan Frankel, who did a tremendous amount of work with a number of helpers checking box scores in various newspapers. He identified about 90% of NL batters and 80% of AL batters from 1897-1909. The results were then prorated for the remainder of the season. Work is continuing on digging up more boxes and also on 1910-12 AL.

I was surprised that Jonathan was able to find so much data. What happened is that the local papers often carried the strikeouts for their games, so it required volunteers all over the country to check the papers, plus some inter-library loans. It was a terrific undertaking."

It turns out that Jonathan has a blog where he posts updates about the progress of his batter strikeout research. He says the 1910 AL is 89% complete right now, and that he has begun work on the 1912 AL as well.

8 Responses to “Pre-1910 Batter Strikeout Data”

  1. Sean Forman Says:

    And I was just going to watch TV tonight.

  2. Johnny Twisto Says:

    The results were then prorated for the remainder of the season.

    I really think this *has* to be clearly noted on the site. Otherwise you are presenting estimated stats as if they were actual. It's similar to the old catcher stats which have SB/CS based on prorated team totals, not actual player totals. It is very misleading.

  3. Owen23 Says:

    Not sure how to explain this...3 at bats and 4 strikeouts ?

    http://www.baseball-reference.com/players/g/gibsoch01.shtml

  4. Johnny Twisto Says:

    Owen, see above. The numbers are being estimated based on incomplete data and it can end up in screwy results like that.

  5. DavidRF Says:

    Retrosheet is updated back to 1918 now and they've had the isolated 1911 NL season done for a while now too. It'll be interesting to see what happens when they get back far enough to encounter this issue. I guess the more eyes looking at things the better.

  6. Owen23 Says:

    This is a great thing (even with sometimes strange results)...the 1900s strikeout machine Billy Maloney !
    http://www.baseball-reference.com/players/m/malonbi01.shtml

  7. Pre-1910 Batter Strikeout Data » Stathead » Blog Archive Says:

    [...] Neil Paine talks about where the site’s pre-1910 batter strikeouts came from. Link Posted on Wednesday, April 13th, 2011 at 5:19 pm, Category: Baseball, Tags: bref, data, history, [...]

  8. Charles Saeger Says:

    After going over his blog, I'll hold out as to why Charlie Gibson struck out four times in three at bats. His extrapolations are based on strikeouts/game. The at bat total may well be wrong.