In-season consistency
Posted by Andy on October 15, 2007
Following up a bit on last night's post, here is a brand new stat for you folks. This isn't actually the career consistency score that I'm working on, but rather a way to look at consistency within one season for a given player.
Before I get into the actual data, let me talk about an example. This one has to do with home runs, and the seasons of two hypothetical players: Player A and Player B.
- Player A hits 30 homers in a season with 600 plate appearances. Like clockwork, he hits one homer every 20 plate appearances. This means that every PA streak in between homers is 19 PAs long.
- Player B also hits 30 homers in a season with 600 plate appearances. However, he has a different pattern. He hits 2 homers in consecutive plate appearances, then goes 38 PAs with no homers.
So, imagine that I wanted to report to you the average length of PA streak that each player went without homering. It turns out that they are the same. Player A had 30 streaks each 19 PAs long, for an average of 19 PAs. Player B, though, has 15 streaks that are 38 PAs long, and another 15 streaks that are 0 PAs long (those are the streaks in between the back-to-back homers. The average of 38+0+38+0+38+0.... is also 19.)
How can we differentiate these players, then? Well, let's take a look at the standard deviation of the homerless-streaks. The standard deviation of Player A's streaks is 0, since there is no variation. That means that you know he homered exactly every 20 plate appearances. The standard deviation of Player B's streaks is 19.3, which is the STD of 38, 0, 38, 0, ....
Think of this 19.3 STD as a sort of plus-minus of the streak length. In the case of Player B, his average streak length is 19, but the STD of 19 tells you that he might have had some streaks as long as 19+19 = 38 (which he did) and other streaks as short as 19-19 = 0 (which he did.)
This is a simplified explanation, but it gives you an idea of the power of the math.
Now, which player is more valuable? To know definitively, we'd need to know the context of all the homers. But generally, Player A probably had a much more valuable season, since he homered in many more games than Player B did. It's possible that Player A's homers ended up being worth a few more wins than Player B's homers, which is a lot.
Let's look at some real data for all the 40+ home run hitters in 2007. I got all this data from each player's PI Batting Event Finder for Plate Appearances. For example, here is Adam Dunn's.
Player HR Average STD Rodriguez 54 12.1 14.6 Fielder 50 12.6 15.1 Howard 47 12.8 11.1 Pena 46 12.3 10.6 Dunn 40 14.8 15.4
Pretty, interesting, huh? As you look at this list, you can see that Carlos Pena's homers were much better spread out than any of the other guys. His average streak length of 12.3 plate appearances is low because he had only 612 plate appearances this season (imagine if he had 700+ like A-rod.) But look at his standard deviation of just 10.6, much less than his average streak length. This tells you that his HRs were pretty well spread out, and that he wasn't a very streaky home run hitter.
The only other guy on this list with a smaller STD than his Average is Ryan Howard, who also spaced his homers fairly well. A-rod, Fielder, and Dunn are on the other side, with STD even larger than their Average, suggesting bunches of games with 2 or more homers, or several homers clustered in a few games followed by large homer-less streaks.
Anyway, what do you all think about this kind of analysis? It can be expanded to lots and lots of stats, including hit, run-scoring hits, etc. I'd like your input.
By the way, in case you're curious where I got this idea, it came from Barry Sanders, the football player. As you football fans know, an average of 4 yards/carry is good for a running back. The really good backs get 0-2 yards fairly often, but also crank out 5-8 yards on a lot of their carries, getting an average of 4 or so. But I always felt that Sanders got 0-2 yards much more often, and then sprinkled in the occasional 80-yard run. If you know Sanders, you know he had a lot of extremely long runs. At the end of the season, Sanders might have 1300 or 1400 yards, but did it really compare well to other backs with similar totals? How many first downs did the Lions lose by Sanders' inability to get 5-6 yards when needed?
October 15th, 2007 at 12:27 pm
This is very very close to what I was trying to say last night, except this is during the season and before was the career. I think this is a good idea, obviously. Maybe consistency can now be a stat... thats for you Joe Morgan
October 15th, 2007 at 12:32 pm
mlbfan30--I did read your comment last night on the other post, and I'm considering a lot of those things in my other metric. I think it's a bit overdoing it to try to normalize for age and stuff, but for example to give each season a score of consistency with the player's career averages makes it much easier to pick out peak years, and to see to what degree those years are different from his non-peak years. You'll see what I mean when I get around to rolling that out.
By the way, this is totally unrelated, but I found something cool right here that folks might like to check out.
October 15th, 2007 at 1:43 pm
To me, the most interesting part of that list is that all of those guys are already out of the playoffs (or never made it). Tangential to that, there was a long run of time during which no team won the WS with one or more hitters with more than 30 HR on their team.
Which brings to mind the question of using the stat you're working on also to measure the balance of any given line-up. Is it really better to have a balanced line-up?
And Barry? oh yeah... The Lions did improve when he retired.
October 15th, 2007 at 3:14 pm
Didn't the Rangers get better when A-Rod left, too
October 15th, 2007 at 5:03 pm
vonhayes: true, although for different reasons
Speaking of A-Rod, though, doesn't he only have a few days left to opt out of his contract (assuming he wants to do so)? I'm surprised we haven't heard more about this.
October 15th, 2007 at 5:50 pm
It's 10 days after the end of the World Series
http://www.newsday.com/sports/baseball/yankees/ny-spken145412427oct14,0,783072.column
October 15th, 2007 at 8:56 pm
I've often thought about "consistency" and what it's worth, though I've never tried to measure it.
In the mid-90s, I ranked Emmitt Smith as the slightly better back than Sanders, because he was more consistent. Every season, Sanders would have a game or 2 in which he'd only gain like 8 yards, 20 yards. Smith seemed to get his 80-120 yards every game. And on the per-carry level, you're right, I think Sanders would have a lot more runs of negative yardage than other top backs.
However, is that really a bad thing? Obviously consistency is nice to some extent in football. A team could average only 3 yards per play, but if they get those 3 yards EVERY play, they will never lose the ball. But in reality, that doesn't happen. If Sanders was breaking off 80-yard runs, those must have been TDs pretty much every time. And the runs when he was losing yardage are probably when he was looking for the big gainer; I doubt he was dancing around in the backfield if Detroit just needed 1 yard for a 1st down.
So, back to baseball. We often see stats saying Player X leads the league with 72 multi-hit games this year, or conversely, Player Y has hit safely in 37 of his last 38 games. Both interesting, but what do they mean? Is it better to hit in more games, or get more hits in a game? Above, Andy seems to conclude that homering in more games is better. Is it? Getting 2 HR in one game seems to be a strong foundation to winning that game. But I don't know the best way to measure that. It can't be as simple as checking team winning % when a player hits 2 HR, or 4 hits, etc., because maybe the other pitcher just stinks and the whole team is hitting that day.
On a seasonal basis, I'm also conflicted. From the standpoint of building a team, I think it's nice to have a player you can rely on. Obviously you never really "know" ahead of time what a player is going to do, and by the time you think you see a pattern after several seasons, maybe the player then starts getting hurt. But I think it's nice to have players who you "know" will hit 30 HR, or pitch 200 IP, as opposed to a Saberhagen who might be great, or mediocre, or hurt. On the other hand, it's those unexpected super seasons which push teams into the playoffs. hmmm....