Pre-1952 stolen base data
Posted by Sean Forman on May 28, 2010
I got this note from Pete Palmer clarifying how the catcher SB/CS data is computed for pre-1952 seasons.
The pre-retro data in my files is estimated. What I did was count of sb and cs allowed by each team and spread it among the catchers based on assists, and also estimated innings based on batting and fielding stats. The innings estimated are pretty accurate, probably off by 10 or 20 innings out of 1000. The sb-cs data is reasonably close. Of course, there are no team cs for NL 1926-50 and for most years before 1920, so I had to estimate that from catcher assists, outs on base and stolen bases. I gave up on before 1890 because the box scores left out a lot of stolen bases. Aside from some 1880s AA rbi, which are 90% complete, that is the only estimated data in there. Jon Frankel is working hard on strikeouts for batters 1897-1909 (1912 AL) which will include some estimated data. We hope to have that by the end of the year or next year at the latest.
May 28th, 2010 at 9:50 am
Hmmm. The information is no doubt interesting, but I think there is a problem with presenting estimated numbers as if they are true. At the very least, you should make it plainly obvious on player pages that those numbers are not necessarily accurate. I didn't know that until I read this, and most people won't read this.
May 28th, 2010 at 10:10 am
I agree with Johnny. Didn't know that either. Maybe a little footnote indicating they are estimated?
The estimated data is useful for evaluating a catcher who starts most of his team games, but no good for determining if one catcher is better than his teammate. I recently read a catcher in Glory of Their Times saying he made a throw to 2B, and the shortstop got there too late, the ball went into center field.
He thought the shortstop (Honus) was trying to make him (a rookie) look bad, to protect the veteran catcher on the team. Honus explained he just wasn't used to the throws getting to 2B that fast. Of course, I then had to check to see what his CS% was compared to the veteran catcher, and they were close to equal. Made me wonder how true the story was if the strong armed rookie was actually no better than the weak armed vet. But I now see we can't be certain about how they stack up to each other.
May 28th, 2010 at 11:30 am
This is exactly as I suspected, right down to the OCS estimate being based on their Outs-On-Base formula. These data should be presented for teams only until Retrosheet gets us better data for the catchers. Similar situation for other data from non PBP sources: we should always list the team total for, say, doubles allowed, even though the total we can actually assign to each pitcher is much less.