Volunteer with RetroSheet and learn some baseball history
Posted by Sean Forman on January 7, 2010
Tom Ruane sent out this note to the RetroSheet mailing list. Sometime this winter we'll have boxscores for every game from 1920-1939, and we are really close to having a complete run of box scores from 1920-2010. This will mean complete career game logs for Gehrig, Grove, Ott, Greenberg, Foxx, Dickey, DiMaggio, Williams. We'll be able to compute Williams' longest on base streaks or Gehrig's best RBI games, or Foxx's home/road splits.
As we wrap up the 1930s and turn our attention to the 1940s, I would like to mention that we are always looking for volunteers to help digitize the Hall of Fame player dailies. Depending upon your skill with a spreadsheet, a team's worth of batting dailies should take you anywhere from 5 to 20 hours of work. I won't lie to you - it can be pretty tedious stuff, but the end product (box scores, player dailies/splits, top performance pages, and so on) hopefully makes it all worthwhile.
If this sounds like something you might be interested in, please let me know (off-list at tjruane who has email at gmail.com).
Thanks.
Tom Ruane
January 7th, 2010 at 1:58 pm
With volunteer help, is there any oversight? I'd be willing to volunteer to help, but if I make a mistake in entering data and it doesn't get noticed, will I have left an error in retrosheet forever or will someone have the tedious job of checking over the entries?
January 7th, 2010 at 2:26 pm
Wow, this is going to be exciting to have 1920-2009 boxscores. Everything post-deadball era will be available day by day. The searches I can do on the Play Index will keep my attention for decades..
January 7th, 2010 at 3:09 pm
#1: I volunteered the last go-round. What I got was images (TIF files) of a scorebook that contained hand-written batting dailies. I just had to enter them into Excel. There were totals at the bottom of the scorebook images so I could perform sums in Excel to double-check the totals. That was all I had to do. It was pretty tedious stuff so I did only 1-2 players per sitting. They do give you quite a bit of time to do it, though.
I was told there are many other checks downstream. Once the data is digitized into Excel then they can use computer scripts to check team totals per game, player totals per season, team totals per season, and clean up double-header confusions. The line ups and box scores were collected later, so there's more checks then. Even more checks from fans like us after they get access to the data. And the hand-written scorebooks are always available for double-checking.
So, the volunteers are just the leading edge in the "digitizing" process. I wouldn't worry about creating errors that last for very long. (I did the best I could of course :-)) The digitization actually makes it much easier to check for errors.
January 7th, 2010 at 8:58 pm
Sounds like something I would enjoy. "please let me know (off-list at tjruane who has email at gmail.com"
I'm not sure what this means. What do I have to do to get more info on how to get involved in this.
January 7th, 2010 at 9:16 pm
Send an email to Tom Ruane at retrosheet.
They are being a bit careful with the email address so that it doesn't get picked up by spammers. The email address starts with "tjruane" and ends with "gmail.com" and the is an @-symbol in between.
Retrosheet's official site is http://www.retrosheet.org.
January 7th, 2010 at 9:58 pm
Makes sense now. Thanks.
January 8th, 2010 at 10:21 am
I was nervous about volunteering as well, but I took the plunge and it's not bad at all. The instructions Tom sends are very clear. I've been assigned the 1944 Dodgers and I'm looking forward to getting to know them better.
And I've now officially outed myself as a total baseball nerd.