Game Scores as a predictive tool
Posted by Andy on February 7, 2011
Our very own John Autin commented on Game Scores on a recent post:
Consider two pitchers:
-- Alfa goes 7 innings, allowing 0 runs on 6 hits, with 0 Ks and 2 walks. Game Score: a modest 64.
-- Bravo also goes 7 innings, allowing 3 runs (2 earned) on 1 hit, with 13 Ks and 4 walks. Game Score: 74.I personally don't accept that Bravo's game is better from a "baseball outcome" angle -- but more importantly, I don't think anyone outside the sabermetric world would buy that notion. And while these example games may not be everyday occurrences, they are far from the most extreme examples one could produce.
JA's example nicely and simply paints a limitation of Game Score. Pitcher Bravo allowed more walks and more runs, and in this particular game put his team in a worse position than Pitcher Alfa did, but Bravo got the higher game score.
However, I think there's an additional angle we can look at here.
Yeah--Bravo didn't really deserve a higher Game Score (if we think of Game Score as an indicator of who did more to help his team win, then Game Score seems fairly flawed.) However, a pitcher who throws 13 strikeouts in a game is, odds on, a good pitcher who you want on your team. Bravo didn't do better than Alfa in the particular games listed above, but if Bravo can routinely strike out the opposition in double digits, then his team is going to win a lot of the games he starts.
Here's what I'm talking about.
Since 1920, there have been 48 games that roughly meet the criteria of Bravo's games. In this case I searched for games with 13 strikeouts, 4 walks, and at least 7 IP. The average Game Score in these performances is 79.2, and the teams went 39-9 (.813) in these games.
Over the same period, if we look at Alfa-style games (0 strikeouts, 2 walks, at least 7 IP) it turns out there have been a lot more. The average Game Score from just the last 300 such games is 55.3, quite a bit lower than the 79.2 in the other case. The team's overall record in all 1,274 such games since 1920 is 703-563 (.552).
A direct comparison of these two groups isn't really fair. With so many more games falling into the Alfa category, it isn't surprising that the cumulative Alfa performance is so much closer to the mean (i.e. a Game Score of 50 and a team W-L% of .500) However, there is quite a large discrepancy between the two groups, and we see in general that 13-strikeouts games are quite a bit rarer.
My point here is that while I agree that the Game Score measure isn't necessarily always great in telling us what happened in a particular game (i.e. which pitcher was really better or more valuable to his team), I do feel that it might give us a pretty good indicator of what we might expect in the future from those pitchers. In other words, a starter like Bravo who gets a high Game Score thanks to a slew of strikeouts, but also gives up some runs thanks to some walks, probably has a higher ceiling in terms of average future Game Score than the guy who doesn't strike out anybody. By "probably" I am only talking averages here. Obviously there will be lots of exceptions. All I'm trying to say is that Game Score has some value in terms of telling us what happened in the game, and also may have some value in terms of telling us what we might expect in the future.
February 7th, 2011 at 9:03 am
Game score is a fairly flawed metric, but IMO, this isn't really the flaw.
We know that defense independent stats are more repeatable than BABIP. A strikeout, walk or home run is *purely* the product of the pitcher v. batter confrontation. Nobody on the defensive team affects those results except the pitcher, and to a much lesser extent, the catcher. Whereas on balls hit in play, the rest of the defense matters as much or more than the pitcher, and plain luck plays a significant role. So a pitchers that has 13 strikeouts and 4 walks, really has in some sense done better than a pitcher who has zero strikeouts and 2 walks, all else equal.
In the lines given above, I would suggest that Alfa's zero earned runs as opposed to 2-3, are much more likely a product of luck and good defense than some special pitcher skills that Alfa had, and Bravo did not. OTOH, it's almost certain that Bravo's 13 Ks while maintaining an excellent K/BB ratio are the result of him pitching better than Alfa in some way.
I think you can find plenty of lines where the game score does not jibe with a reasonable intuition of who pitched better, but this isn't one of them. If you really believe that Alfa pitched better than Bravo here, then you either have access to more data (pitch F/X, GB/FB, TZ/UZR, etc.) which suggests such, or you just don't get how much more DIPS data reflects pitcher performance over short periods vs. ER/R.
In the long run, if you can adjust for defense and park factors, Runs gets you more data, but in the short run, there is too much fluctuation. Over one game, I think it's reasonable to consider DIP data a much more reliable indicator of how a pitcher pitched than run and hit data.
February 7th, 2011 at 10:41 am
"Game score is a fairly flawed metric"
Agreed.
"Since 1920, there have been 48 games that roughly meet the criteria of Bravo's games. In this case I searched for games with 13 strikeouts, 4 walks, and at least 7 IP. The average Game Score in these performances is 79.2, and the teams went 39-9 (.813) in these games."
Who were the pitchers, and in how many games did the starter get a decision?
In all likelihood, the pitchers in this scenario were all "stars", you probably won't see a Tim Wakefield or Jamie Moyer on it.
So, the idea of seeing a repeat performance going forward should not be a surprise or unexpected.
February 7th, 2011 at 11:10 am
Andy -- I agree that Game Score has some value as a predictive tool, if properly controlled. Given two pitchers in a similar league and park context, over a substantial number of starts, if one has a significantly higher average Game Score, he's likely in the future to have the better ERA+ (or whatever sophisticated pitching measure you choose).
On Michael Sullivan's post @1 -- Valid points. I do grasp the likely defensive contribution in my Alfa example (7 IP, 0 Ks, 0 runs). I grant that my Alfa/Bravo example was simplistic. On the other hand, there are many more flaws in Game Score than I mentioned in that post, including:
-- Strikeouts and walks are valued equally. A line of 7 IP, 5 Ks, no walks has the same Game Score as 7 IP, 12 Ks, 7 walks (all other stats being equal) -- which is, frankly, just dumb.
-- It takes no account of groundball/flyball rates; a pitcher with a high GB rate may tend to allow a higher BABIP, but those hits are less likely to be extra-base hits, especially HRs, and that pitcher is likely to get more GIDP (another thing not counted by Game Score).
I just think the whole concept is half-baked. I get that it's meant to be a fairly simple formula that a normal person could just about figure in his head, and that if the various factors in Game Score were weighted more precisely, the formula would lose that virtue. But in my opinion, the combined weight of all that imprecision, not to mention the things it completely ignores, almost defeats the purpose of having a single number that purports to give a snapshot measure of game performance.
February 7th, 2011 at 11:13 am
I thought it was just a "toy stat". Invented primary to rank great pitching games in order of dominance... for fun.
There's really no "sabermetric justification" behind some of the weightings of its contributions. 2 UER == 1ER? 1 K == 1 UER? Nobody is ever planning on using Game Scores into serious metric. We should just have fun with it.
February 7th, 2011 at 11:44 am
@4, DavidRF -- I'm OK on having fun with Game Score. I've only criticized the stat when someone has tried to pass it off as having significant meaning.
February 7th, 2011 at 1:00 pm
@5 John, mathematically and sabremetrically it may not be very significant but I feel like it is decent gauge of "dominance" within an individual game if that makes sense.
If you were to look at the highest game scores in history (like Kerry Wood's 20 K, 1 hit game back in 1998) you thought process would probably be something like "wow what a dominant performance."
But say in terms of comparing game scores it may be tough to decide what is in fact a "better" performance especially if the game scores are derived from two different angles as highlighted in the original post.
February 7th, 2011 at 1:47 pm
@6, Joe C -- I'm not sure that I'm following your point.
Clearly, I don't need Game Score to provoke a "Wow!" over Kerry Wood's 20-K game (or those of Roger Clemens or Randy Johnson).
Nor do I need Game Score to provoke an "Ugh!" over Jamie Moyer's 9-runs-in-1-inning disaster against Boston last season.
Does its usefulness lie in making subtle distinctions between similar performances? Certainly not; it does not pretend to that level of precision.
Probably it has some value as a predictive tool, if used to compare contemporary pitchers in terms of their average game score. But then, we already have predictive tools at least as good as Game Score.
What, then, do I gain by referring to Game Scores?
February 7th, 2011 at 2:05 pm
@7 -- I guess my point is it is a way(in terms of comparing pitchers/games) is that it is a stat that is better observed individually by game as opposed to average GmSc. I feel like it gives us a gauge of a more "dominant" start compared to a peer as opposed to a "better" or more successful start.
I don't know if that holds water maybe there is a hole in my thinking but that is always how I have always looked at it since discovering the stat.
February 7th, 2011 at 6:56 pm
[...] The Baseball-Reference.com Blog had a story about game scores today talking about how very different games can result in somewhat confusing outcomes providing the following example: Consider two pitchers: – Alpha goes 7 innings, allowing 0 runs on 6 hits, with 0 Ks and 2 walks. Game Score: a modest 64. – Bravo also goes 7 innings, allowing 3 runs (2 earned) on 1 hit, with 13 Ks and 4 walks. Game Score: 74. [...]
February 7th, 2011 at 10:52 pm
I can't really agree that a Game Score is a good predictor.
I ran a check of players with the most Game Scores above 75.
My guess that Ryan would top the list was correct.
He had 178 such games, winning 150 of them.
My point is, 178 of 75+ game scores sounds amazing but only equal 23% of Ryan's starts, though represents over 46% of his wins but just less than 1% of his losses.
Ryan was a low hit, high K guy for 27 years. Custom made for high game scores, which he obviously has a lot of, but he was mostly a .500 pitcher.
I'd take a lower game score type, say a Glavine or Maddux a hundred times before Ryan .
February 8th, 2011 at 12:24 am
I don't know that game scores have much predictive value for individual pitchers (the Red Sox lost eight games in 2010 in which John Lackey had games scores of 54 or higher) but we also find that the 57-105 Pirates won 65% of the 60 games in which its starters had game scores of 51+.
I think if you go through recent seasons you'll find the break-even point for game scores to be somewhere in the 40's, suggesting that the metric could stand to be tweaked.
February 8th, 2011 at 8:27 am
Andy, if I read you right, one of your premises is that high strikeout totals should lead to higher wins (by the team).
I lack the skill to work the stats on this site, but I'd sure like to see some numbers to back that up.
Nolan Ryan only won something like 45% of his starts. Randy Johnson's much higher. Even Blyleven would produce a low number. This is, of course, hack job anecdotal analysis. Something a little meatier would be appreciated.
ps. If nothing else, most scouting departments show this thinking- going after high strikeout pitchers over any other type of prospect
February 8th, 2011 at 8:31 am
Duke: Ryan was a .500 pitcher because he played on a lot of teams with poor run support, not because he was average.
He's not the best pitcher ever, but given his style and longevity, it seems plausible that he may have pitched dominant games in greater number than anyone else. I wouldn't take him over Maddux, but that's because mixed in with the dominant games were some high BB stinkers, while Maddux has very few of those. If I went into a game with a suspect defense, and magically knowing for certain that I absolutely had to have a lights out dominant performance in order to win, Ryan might be a better choice.
On average -- I'd take Maddux, but Glavine (or pretty much any other pitcher who isn't in or at the edges of the "best ever" conversation) would be debatable.
February 8th, 2011 at 11:00 am
Mike E @ 13
I know Ryan was the more dominant pitcher and could light up any game. Everytime out you could anticipate something dynamic; like 15 + SO or a No-No, but if I were starting a team and had the hypothetical/ 'fantasy' choice for a starter to anchor my rotation for twenty years, I thought Glavine was more consistent, not better.
I'd liken it to a Canseco vs a Boggs. In '87, average fans would prefer to see Canseco - all-or-nothing - than a disciplined Boggs.
Someone mentioned earlier that a Game Score is a good 'toy' and I agree some what there, but I don't think Game Scores could be compared to eachother , to say X performance was better than Y performance.
In a way, it is similar to a Quarter Back Rating. And a couple years back, Chad Pennigton led the NFL in QB rating with Miami. We all no he was not the best QB, not even top ten.
February 8th, 2011 at 1:00 pm
I know I'm going to catch hell for this, but I hate Nolan Ryan. I think he's one of the most overrated pitchers ever. HOF great- sure. Other than that.
I think he was the Dominique Wilkins of baseball.
He invented the idea of a pitcher having a personal catcher, because he was the most selfish pitch caller in the world.
And this ain't even late night/high BAC barkfart talking.
February 8th, 2011 at 4:58 pm
@13, Michael E. Sullivan said: "I wouldn't take [Ryan] over Maddux ... because mixed in with the dominant games were some high BB stinkers, while Maddux has very few of those."
Well, if you do mean "high BB" stinkers specifically, you're probably right.
But in terms of plain old stinkers, then it totally depends where you set the line. At the ugliest end of the spectrum, Maddux is actually worse:
GSc <= 20: Maddux 23, Ryan 13.
GSc <= 30: Ryan 68, Maddux 65.
GSc = 71, it's Ryan 246, Maddux 147.
Ryan does have a big edge at the very high end: for GSc >= 86, it's Ryan 57, Maddux 10.
But Ryan also has a big edge in the moderate range of 71-85: Ryan 189, Maddux 137.
Just so we're clear -- I'm neither denigrating nor praising either one of these pitchers. And I think it's crystal clear that Maddux had the better career. And most of you know that I'm not a big fan of Game Score in the first place. I just wanted to point out that our natural assumptions about which one was better able to avoid the "total stinker" is not necessarily true.
February 8th, 2011 at 11:43 pm
Re: my #16, something happened with that list of Game Scores in the middle of the post -- the 3rd one is garbled, but I don't have my data here. (Not that anyone's reading anyway....)
February 8th, 2011 at 11:47 pm
@15, Barkie -- You're not alone in your Nolan Ryan enmity. My family was at Tiger Stadium for my brother's 12th birthday on 7/15/73, when Ryan no-hit the Tigers with 17 Ks. I've never forgiven him for breaking my 9-year-old heart....
February 9th, 2011 at 8:18 am
@ 15
Nolan Ryan did *not* invent the personal catcher. He was *not* the most selfish pitcher in the world. Sometimes I wonder if you write the things you write just to troll, or whether you are incapable of making statements without ridiculous hyperbole. It's almost like you've been posting exclusively on ESPN.com for years and are unable to re-acquire a posting style that is not primarily meant to degenerate every topic into a flame war.
February 9th, 2011 at 5:11 pm
I am surprised to see most saying it's not a valid predictor. I would venture a guess that if you added -IPS to the end of it or called it something like GAS (Game Account Score) that more people would say it's amazing.
It seems like a fair indicator of consistency (though I'm not sure), but it definitely captures the 'sexiness of a start'.
In reality all organizations would take the Bravo pitcher and I don't think that's an over-reaching statement. A ball missing a bat significantly decreases possible outcomes that can benefit an offense. If you strike out 48% of your batters faced (as the Bravo pitcher did- assuming an error creates a 27th BFP) you're usually in a better position to win than a pitcher that leaves 93% of his success up to his in/outfield. But this is all known stuff.
February 10th, 2011 at 12:17 am
@ 20,
Of course the logic of more Ks less errors is indisputable. But many high K pitchers - high game score guys - Ryan, Wood, Feller, McDowell had very high walk rates.
So what is the difference than yielding one error due to being a contact pitcher, or walking 5 guys in a game. In the end, a base runner, is a base runner, regardless of how he got there.
Maddux and Ryan have very similar IP. Ryan struck out 1,600 more than Maddux. In those 1,600 extra outs Maddux had to lay at the mercy of his fielders, do you think they made 1,500 errors, which would account for Maddux and Ryan's BB difference.
BTW: Pitcher Bravo is not going to SO that many each time out. Just check out AJ Burnett.
February 10th, 2011 at 8:56 am
@21
I don't understand what you're trying to say. Game Score accounts for SO/BB ratio as they are weighted evenly.
I can't see how you can say a baserunner is a baserunner when obviously that isn't the case. Rickey Henderson on first base after a walk is a bit different than Darrell Evans being on first base after a walk. Intentionally walking Barry Bonds which creates a baserunner is different than walking J.T. Snow.
In any event, can one Game Score predict anything? Obviously not without a huge percentage of error. But... look at the game logs for two pitchers from 1988 (both of which are left handers):
Alpha (Allan Anderson): solid, end of year statistics, game scores aligned with the Alpha example. Low strikeout, low walk rates.
Bravo (Al Leiter): some absolute stinker games, was hurt, but showed a propensity for 70 game scores.
This type of quick assessment can be used to back up a hypothesis that a team is more likely to covet pitchers who produce a higher percentage of 70 game scores than a pitcher that produces 60 game scores.
Can it predict an individual outcome: No
Can it be used as a comparison between games? No.
Can it be used to predict a pitcher who is likely to be coveted by a team because of the appearance of a higher ceiling? Yes.
February 10th, 2011 at 1:15 pm
NoChance -- I agree with most of your points @20/22.
On this one, I'm not sure what you mean: "Game Score accounts for SO/BB ratio as they are weighted evenly."
To me, weighting them evenly is a big flaw in the Game Score method. Two ways of seeing this:
(1) Do any starting pitchers in today's game succeed with a SO/BB ratio of 1?
(2) By Game Score's accounting, the combination (1 BB + 1 SO) has exactly the same value as (2 outs on balls in play). That's nuts.
I agree that, all other things being equal, a higher K rate correlates with better future performance in the future -- mainly because higher K rate correlates with lower hits rate. But in this sense, Game Score double-counts the strikeouts: The reason Ks are valuable is that they never become hits. But then, to both reward the Ks and penalize the hits is, in essence, double-counting.
February 10th, 2011 at 2:24 pm
@23
Game score does indeed double count strikeouts and there is a definitive flaw in it for that reason. I can say, "I think it should be there" and I think it should, but I can't give logical reasoning why.
Here is my assessment/commentary on Game Score:
1. Bill James tossed this off very quickly, partly, probably to say: "Hey, look what I came up with on the toilet one evening. I have a publishing contract and here it is. It's fantastic."
2. It's great in its simplicity (OPS is great in its simplicity too). If your kids were to say, "How can I gauge the value of a pitcher" you probably would reach for this before ERA+, FIPS, GIPS, MIPS or SIPS and generally the point would get across.
3. The real question is whether there is validity in a Game Score and whether it is worthwhile advancing the metric? I think it is and I think it can and should be expanded on (adding GIDP as an example). That will serve to paint a better picture of performance in a game whereas currently it is a good rough sketch.
I think the original post is brilliant in that it states a clear problem but then offers a viewpoint that it has value as a predictor. Advancing the discussion to an advanced Game Score (where perhaps you start actually start at a number (say 27) and reduce it, rather than start at 0 and add to it; add GIDP, etc.) seems like a logical step.