Splits/Boxes/Gamelog Suggestions
Posted by Sean Forman on March 3, 2010
I've been working on getting the latest and greatest data from RetroSheet onto the site and will be making a few additions to the affected pages as well, which I'll go into more later when they launch.
Building all of this stuff is a five day process where our server runs continuously for five days building the 120,000 box scores, the 9m rows of play-by-play, the 5m rows of gamelogs, and 10m rows of splits. So adding a little thing here and there just isn't worth it. I've got about two windows a year to get things added and this is one of them. So if you want to suggest a split, gamelog, or boxscore feature, now would be a good time to do so.
One idea I've had since we'll be adding a lot of data from 1920-1939 (no pbp, just boxes) is to add a split for vs. RHstarter and vs. LHstarter. We won't know Lou Gehrig's exact splits, but we'll know what he did when a lefty started the game and when a righty started the game.
Others you would like to see?
Note: We also had a twitter outage after our blog update, but things are back up and running now.
March 3rd, 2010 at 11:07 am
so the game, event, , b vs. p, and streak finders will be updated with the 1920-1939 stats (babe ruth's 1927 streaks will be there...)?
March 3rd, 2010 at 11:08 am
Yes, that is the plan. It is going to be a bit hard with the 12 year gap, but I think we can make it work.
March 3rd, 2010 at 11:09 am
Actually, let me clarify that. We wont' have b vs. p or events because we don't have pbp for those years. We will have streak and game finders for those years.
Splits will also be limited to complete game splits like day/night, home/road, by opp, by lineup slot, etc.
March 3rd, 2010 at 11:34 am
I'd like to see the split "DP Situation" (1st base occupied; less than 2 outs) both in the splits sections and as an option in the event finder.
And this is probably pie in the sky, but an option on the splits page to display the splits as home only, road only, and perhaps versus lefty or righty. IOW, a double split like "RISP at home" or "After 0-1 counts versus lefties".
March 3rd, 2010 at 1:50 pm
1) Make groundball/flyball and power/finesse splits be equal to a set percentage of the league for each split. Say, have the 25% of all pitchers with the most walks and strikeouts per batter faced be power pitchers, and the 25% with the lowest be finesse pitchers, rather than having a set minimum.
2) For seasons without full PBP data, do display opposition lefty/righty splits on the team level only. It will help to know how many lefties a team faced, even if we can't peg them to a specific pitcher.
3) Same story with opposition records -- even if we cannot peg a double allowed or a stolen base to a specific pitcher, knowing the team total will be helpful.
4) For games wherein multiple fielders played a position, Retrosheet doesn't give an innings total for the individual fielders. It would help to have an estimate.
5) An opponents' fielding line, which would let us know where the errors fell and give us some idea of how often this team hit groundballs.
March 3rd, 2010 at 2:05 pm
I'd like to be able to cross-index the various splits. I'd like to be able to see what field players hit to in particular parks, for instance, or be able to divide the opponent splits into home and away. How well has Derek Jeter hit against the Red Sox in Yankee Stadium? How many opposite field home runs did Carlos Delgado hit at Shea?
March 3rd, 2010 at 2:57 pm
In Play Index, don't just list the top 8 or 10 choices for a filter, list them all (if they are teams or parks, at least). It's annoying to try to check a player's PA's at another park, and if it's not one of the top ones he played in, you have to worm your way there by a long, difficult route...
March 3rd, 2010 at 3:12 pm
I've written a couple of times previously about a minor glitch in this site's fielding statistics. I hope this is an appropriate forum to try again.
Using this link (http://www.baseball-reference.com/leagues/MLB/2009-standard-fielding.shtml) as an example, the problem is this...for multi-team, multi-position players, the season totals aren't all accurate.
To illustrate, switch the view from the default (alphabetical) to the 'GS' column.
- The new display will start with Prince Fielder, who was the only major leaguer to start all 162 games in the field.
- As you scan down the page, you'll find Orlando Cabrera at Line # 8. He played for two teams in the same league, but all at the same position. His entry is correct.
- Scroll further to Matt Holliday at Line # 20. He played for two teams in different leagues, but all at the same position. His entry is also correct.
- You'll find Victor Martinez at Line # 45. He played for two teams (same league) at two different positions. His totals are also correct.
- You'll find Mark DeRosa at Line # 107. He played for two teams in different leagues, and started games at four different positions for each team. Still, his totals are correct.
- Eventually, you'll find Nyjer Morgan at Line # 261. He played for two teams in the same league, but played two positions for one team and only one position for the other. That may be the key because his totals are wrong. He should be found at Line # 143 with 115 GS. For some reason, only his Pirates totals appear at Line # 261. Morgan later appears at Line # 310, showing his combined total of GS at only one position.
That was long-winded, but it must be a simple problem with a simple fix.
March 3rd, 2010 at 3:25 pm
While seconding every request mentioned up to this point, I'd really love to see:
* All of the pitch-type statistics (e.g., bb-ref.com/leagues/MLB/2009-pitches-pitching.shtml), especially in the player leaderboards, display at least one decimal place (Or at least make that available when you view in CSV or PRE form, if space is a major issue.) B-R offers some really great information on those pages -- information you can't get elsewhere -- but I'd love to see a little more specificity in the rankings and the numbers displayed.
March 3rd, 2010 at 4:31 pm
Not sure if you're touching team pages, but if you are, a few suggestions:
When you're looking at a team's stats page it would be nice if there was some kind of indicator for which players were traded/acquired that season. Maybe a symbol next to the name like '+' for acquired and '-' for traded/dropped.
A more convoluted but kind of interesting addition would be a table at the bottom of the page listing player movements. A 3-column table with 'Name', 'Date', and 'Movement'. The movement column would include things like 'picked up in trade', 'lost in trade', 'demoted to minors', 'promoted from minors', 'picked up off of scrap heap for midseason playoff push following injury to useful player', etc.
March 3rd, 2010 at 4:54 pm
I cosign Charles Saeger's first suggestion. The way it's set up now, almost no one from the '50s falls under the "power pitcher" split, because there were many fewer strikeouts then.
March 3rd, 2010 at 5:26 pm
I'd like to be able to sort players on Black Ink, Gray Ink, etc. I'd like to be able to sort pitchers on pitching and batting stats simultaneously, e.g., who hit the most home runs of any pitcher who struck out 200 batters in a season. I'd like to be able to sort on single season and career stats simultaneously - what's the record for most hits in a season by a player who had fewer than 1000 hits in his career?
March 3rd, 2010 at 5:34 pm
@4:Gary
I've added DP situation. Double splits can be done sort of with the event finders.
@5:Charlie
I'll look at the GB/FB suggestion, but that probably isn't going to happen.
re: 2) do you mean the pitching stats of the LH and RH pitchers facing a team? How is 2 and 3 different?
4) how would you suggest estimating it?
5) interesting idea, I'll look at add a cumulative fielding line for the opponents by position.
@6: Zach
The PI event finder will give you the Jeter info.
@7: Greg, I'll work on that.
@8 JDV, I'll fix that before the season starts
@9: Ryan, I'll see about adding a digit.
@10: not the focus right now, but a good idea
@12, please see the description of this blog entry. 🙂
March 3rd, 2010 at 7:05 pm
Are there home & away splits, and I'm just blind?
March 3rd, 2010 at 9:32 pm
@12:
Gerry, I get the following:
Johnny Hadopp 225/880
Beau Bell 218/806
Dale Alexander 215/811
Dustin Pedroia 213/580
Hanley Ramirez 212/771
Benny Kauff 211/961
March 3rd, 2010 at 9:52 pm
DavidRF is referring to Jonny Hodapp who had 225 hits in 1930. (I wouldn't ordinarily correct this, but the b-r search tool returns nothing for the name as typed.)
http://www.baseball-reference.com/players/h/hodapjo01.shtml
March 3rd, 2010 at 9:57 pm
Sorry. Typo. My script returns:
H_N H_Sum H_Max playerID
9 880 225 hodapjo01
7 806 218 bellbe01
5 811 215 alexada01
4 580 213 pedrodu01
5 771 212 ramirha01
8 961 211 kauffbe01
... and I tried to make it more readable. I got a little lysdexic.
March 3rd, 2010 at 10:45 pm
@14:Jeff
Splits are linked just above the player and team batting and pitching stats.
March 3rd, 2010 at 10:48 pm
I've asked for this a couple of times over the last couple of years, I'll try again. I'd like the ability to sort by Batting Runs. The stat appears on the players stat page, but it's not a sorting option. This is only for the season finder.
March 3rd, 2010 at 10:50 pm
Since we're answering Gerry's examples, here are the single-season home run leaders among pitchers with 200+ strikeouts. (I found this using just PI and Excel)
7 - Jack Stivetts (1890 and 1891), Don Drysdale (1965) & Earl Wilson (1966)
6 - John Clarkson (1887), Fergie Jenkins (1971) & Carlos Zambrano (2006)
5 - Jim Whitney (1883) & Bob Gibson (1965 and 1972)
March 4th, 2010 at 12:01 am
[...] exact numbers for GIDP opportunities isn’t available right now, though it could be coming. For now what we can do is work with an estimated number. Clearly, we can narrow down opportunities [...]
March 4th, 2010 at 1:02 am
-xFIP
-manager ages
-able to do that year by year sorting for a specific team when he splits years between teams. IE if Player A spends 01-07 with Boston then 08 split between Boston and Baltimore, I'd like to be able to, if I wanted, get the 06-08 Boston numbers but with your AWESOME selecting yearly splits tool i wouldnt be able to get just Boston 08 itd be 08 collective
March 4th, 2010 at 7:03 am
Thanks Sean, DavidRF, Raphy. I wasn't particularly interested in the exact searches I mentioned; I'd like to be able to do that *kind* of search on my own, instead of having to rely on the kindness of strangers. But I'm sure you knew that.
March 4th, 2010 at 10:54 am
SF@13: #1 is a bigger deal for power/finesse anyways. Right now, if you move past the steroid era, the numbers are meaningless because so few pitchers qualify to be power pitchers. Really, you're best off just ditching the split as things stand.
#2 and #3 are related. I'm just making sure there's a full opposition line, and for pitchers' splits as well.
#4 is going to be a project for someone(s). My solution was to use plate appearances -- the relief fielder gets the minimum number of innings needed to have batted so many times. If that isn't clear, use plate appearances as a split, any excess going to the starter -- in a 8.2 inning game, if both shortstops have 2 PA, we'd grant 5 innings to the starter and 4.2 to the reliever. If someone isn't doing it by hand, which is probably too time consuming, have a computer assign by plate appearances percentage.
March 4th, 2010 at 1:58 pm
The only down side to the "game event" multi-split feature is that you can't get something unless it's in the top amount.
A player that has 5 of something may not show up in the first list because the more amounts are given first (in the NFL TD log, there is a "show full list" message that allows you to expand the list to show all from the first page...not just from the top amounts)
March 4th, 2010 at 3:30 pm
How about a season streak feature for players, pitchers, and teams
(most consecutive years/seasons getting X amount of something...)
March 4th, 2010 at 4:38 pm
CS@24 #4: I meant starter gets 5 innings, relief fielder gets 3.2 innings.
I'm thinking there has to be a reason, but why are there no individual pages showing ballpark data? Say, a page showing the doubles hit in Wrigley Field each year and similar such. Someone down on Tango's site mentioned lefty/righty ballpark data, but unless I'm a complete dipstick, I can't even find a page dedicated to each park.
March 4th, 2010 at 4:58 pm
Now I'm just piling, but didn't James make a similarity score for seasons too? Would that be displayable?
March 5th, 2010 at 10:38 am
Charles,
You can get ballpark totals on the league splits pages. I hope to add a comprehensive ballpark page at some point.
I've added manager ages to their outputs.
Season streaks are a long-time request.
March 5th, 2010 at 11:11 am
One semi-related thought:
Let's say I click on (random example here) 2009 AL offensive splits. Then I click on the red for April batting info for all teams. Boom - all team info for April pops up. Cool.
Can I click on the header rows so that the info is organized by that column? I used to be able to do that, but now I can't. It comes up by sOPS+ (I think) but I can't get anything else up.
March 5th, 2010 at 11:14 am
Chris,
You can't sort within a popup, but if you click on the permalink option you can then sort to your heart's content.
March 5th, 2010 at 11:49 am
Maybe you can already do this, and I just don't know how, but with players who are traded partway through a season, it'd be nice to be able to isolate sub-splits within that season for each team (so that we could check for differing situational usage on the two rosters, for example).
March 5th, 2010 at 11:59 am
Vlad:
Look for this text
2008 Season Splits: Season Total / Boston Red Sox / Pittsburgh Pirates
on this page
http://www.baseball-reference.com/players/split.cgi?n1=bayja01&year=2008&t=b
March 5th, 2010 at 2:02 pm
not sure if this has been mentioned:
Stadium Splits!
LHP/RHP vs. LHB/RHB in Yankee Stadium, Fenway Park, etc.
March 5th, 2010 at 3:40 pm
SF@29 -- yeah, I was looking for the multi-year data, akin to the current pages from Retrosheet on, well, steroids.
March 5th, 2010 at 9:30 pm
1. Batting Runs and Batting Wins added to the PI!
2. Baserunning information for the event finders (eg so I can find who has stolen home the the most times, etc.)
March 6th, 2010 at 12:35 am
This might be a bit difficult to put together, but what about linking box scores to Google Archives articles about the games? For example, the box for this 1974 Pirates-Cubs game could link to the game's coverage in the Pittsburgh Press.
March 6th, 2010 at 8:56 am
One last thing I'd like is a better way of distinguishing players with the same name (like the two Alex Gonzalez').
This may be required soon as (for now) both Ramon Ramirez' are members of the Red Sox!
March 6th, 2010 at 5:42 pm
Would it be possible to get the statistics of players for a team and a league on a specific date like June 15, 1930?
Would it be possible to get statistics for the LAST x games of the season? Presently we can get statistics for players for the Cubs for the first 135 games of the 1969 season, would if be possible to get the last 27?
Lastly, would it be possiblee to get LIFETIME statistics for a team instead of just the cumulative totals?
I know this is not related to games but....
Keep up the great work sir....
March 8th, 2010 at 6:05 pm
Something I would like to see is a team leaderboard, where you could see things like "most innings pitched by a team in a single season" or "most runs scored by a team in a single season" - this seems pretty basic, but I can't find it anywhere on teh site as is.
March 9th, 2010 at 6:21 pm
Parting request: most common lineup/defensive alignment against lefty/righty pitchers. You could also do most common by month, to get an idea how teams shift things around during the year, but it isn't as important.
March 10th, 2010 at 9:59 am
I swear this is my last suggestion: In the streak finder, can there be some sort of indicator to show that the streak is still "active"?
March 10th, 2010 at 10:08 am
We won't know Lou Gehrig's exact splits, but we'll know what he did when a lefty started the game and when a righty started the game.
Others you would like to see?
I know you won't be incorporating pbp data. However, I would be interested in seeing any pbp data available for innings in which Lou Gehrig hit a grand slam. Maybe it could be down as a separate special feature linked to Gehrig's BB-REF player page, akin to the log available for Ripken's streak. Thank you.