Help B-R. Enter Data.
Posted by Sean Forman on August 13, 2009
Mark Armour, founder of the SABR BioProject, allowed me to repost this message he sent to the sabr bioproject e-mail list. I wanted to repost it because it is well written and gives you a way to help RetroSheet and help Baseball-Reference.com. Since much of the data that appears on this site comes from the heroic volunteers at RetroSheet. Anything that helps them out will eventually find its way onto this site and into our databases. Besides you know you would love to enter 1942 Detroit Tigers game logs. Anyway, here is Mark's note.
If you are like me, you have spent a lot of time with Retrosheet, most likely with their amazing web-site to look at play-by-play, boxscores, and hundreds of other things. I recently finished a long research project on Joe Cronin, and I used Retrosheet many times a day for a few years. It is indespensible, and I cringe to think of the laborious research people did before it came to be. In fact, I cringe to think of things that I spent weeks working on that I could now do in minutes.
Retrosheet is the work of volunteers--it is a non-profit organization, just like SABR, and everything you see is due to the labor of people who feel like helping out. Most of these volunteers are SABR members, people you know. In the world of baseball research, these people are heroes.
I was sitting in the Retrosheet meeting a couple of weeks ago, listening to Tom Ruane talk about the latest project they needed help with. And I thought, "OK, it is time for me to step up." I helped with a Retrosheet project years ago, but in reality my use of the site far exceeds the small amount of labor I had put into it. My Cronin project is finished, and this is a good time for me to do a little payback.
A lot of the recent Retrosheet effort has gone into creating box scores for the many seasons for which they will not have much play-by-play. Retrosheet now has box scores for every game played in the 1920s, for example. Now they are working on the 1930s, and this is where they need help.
For many years the league offices would compile something called "dailies" for each player. Each player would have their seasonal record, day-by-day, entered into a big log book. If you look at Babe Ruth's pages for 1930, for example, it would have a single line for each game, which would contain the date, at bats, runs, hits, doubles, etc. A player who played regularly would have four or five pages of data for a season. A player who played one game would have one page and one line. A team would end up with 60 or 70 pages.
Retrosheet has copies of all of these dailies, scanned in and readable as images on your computer. What they want to do is digitize them -- simply type these hand-written numbers into spreadsheets. Same information, but more useable. Once the data is in this form, one can write programs to do a million different things with them.
The work that is needed is very simple. Type the numbers into a spreadsheet. I recently finished doing the batters for the 1935 White Sox. The work was easy. Hey, let's be honest: it was also boring and laborious. But it was not hard--anyone on this list can do this. You just need to be able to type numbers into boxes. I just put the image on the left side of my computer screen and the spreadsheet on the right, and got to work. For the White Sox, it was 1643 lines of numbers. Here is the first line (probably unreadable to you):
Appling 4 17 3 1 1 1 1 2 1 SS 3 3 1
Zeroes are omitted, which is why you see a lot of blank spaces. Once I got into it, I found many ways to save time.
If you are between projects, like I was/am, and you feel that Retrosheet is an invaluable resource that you have not contributed enough to, like I do, this is your chance to step up. It will likely take you a few weeks to finish a team, working a little bit every day. It is not glamorous work. But it is necessary and valuable. Having finished a team-year, I now feel a little more worthy to use Retrosheet. Not quite worthy enough: I will be doing more.
To contribute, please (and I do mean please) contact Tom Ruane at tjruane@gmail.com.