Thursday, July 30, 2009

Taking One For the Team

In 1887, baseball began officially recording the hit by pitch as a statistic. The all time leader in hit by pitches, Hughie Jennings, started his career 4 years later and compiled 287 over his career of 5639 plate appearances (Jennings also leads the all time list of HBP/PA, minimum 1000 PAs, by a very wide margin). Over 100 years later, Braves and Red Sox second baseman Mark Lemke put together a solid 11 year career with 3664 trips to the plate without ever once being a hit by pitch. At the same time, Craig Biggio was in the midst of a 20 year career in which he fell just 2 hit by pitches short of Jennings's record, but in over twice as many plate appearances. How can players vary so much in this category, when there is seemingly no skill involved in being hit? This question seems especially important considering that hit by pitches positively impact a player's on base percentages, one of the most important statistics that we look at today. Consider that if Mark Lemke had been hit as often as Craig Biggio, his career OBP would have been around 23 points higher (.340 instead of .317). Or if he had been hit as often as Jennings, his career OBP would have been all the way up around .368! Why should Jennings and Biggio get so much credit for something that they don't control, while Lemke is stuck with a below average .317 career OBP? The answer is that maybe there is some link between a player and his HBP rate.

In order to test this, I split up all active players' careers into odd and even seasons, odd seasons being the player's 1st seasons, 3rd season, and so on and even seasons being the opposite. I used only players with at least 2000 plate appearances in both odd and even seasons and compared their hit by pitch percentage (HBP/PA*100) in odd and even seasons. If hit by pitch is just a random occurence regardless of batter, there should be no correlation between a player's odd and even HBP percentage. The results of the 71 players in the data set paint a clearly different picture:

There is an obvious relationship here between the odd and even HBP percentage. Garret Anderson has the lowest HBP rate of all players with at least 8000 career plate appearances, having been hit just 6 times in his 16 year career, including only twice over the past 11 seasons. He has been hit 3 times in 4328 odd PAs (0.069%) and 3 times in 4441 even PAs (0.068%). On the other side of the coin, David Eckstein has the highest HBP percentage (2.66) in this group, and both his odd and even HBP % are higher than any other player's odd or even HBP %. There are no players in the data set with half of an Eckstein career and half of a Garret Anderson career, and the points form a linear pattern. The slope estimate for the regression line is near 1, as it should be if getting hit by a pitch is some sort of repeatable skill; a 1 percentage point in even HBP % should correspond with approximately a 1 percentage point increase in odd HBP %.

The next question to consider is whether this correlation is due to actual skill by the player, or some other factor outside the player's control. Note that skill in this case does not mean ability to move into the way of the pitch. More accurately, it means having certain abilities or traits that increase the likelihood of getting hit by a pitch. Maybe Garret Anderson can't handle the outside pitch, for instance, so pitchers usually try to hit the outside corner, thus decreasing the chance that he gets hit. David Eckstein might have an unusual willingness to let a pitch hit him and Jose Guillen (2nd of the 71 players in HBP rate) might just be such a nasty guy that pitchers like to throw at him. Or, in what I thought was the most likely case before I started this research, good hitters are more likely to get hit simply because they are more likely to get on base, so the pitcher is not worried as much about hitting them. That would not explain why Eckstein, career .658 OPS, tops the list, but I checked for relationships between HBP percentage and batting average, on base percentage, slugging percentage, and weighted on base average.

As you can see, there is no relationship between HBP percentage and batting average. There are slight positive relationships between SLG, OBP, wOBA and HBP percentage, with R-squareds ranging from 0.6 to 1.7, but they are not statistically significant. Therefore, though on some level batters may tend to be hit more if they hit more home runs or walk more, that only explains a very small amount of the variation in HBP %. Another thought that occurred to me was that, with the hit by pitch being a relatively rare occurrence, perhaps all the variation in HBP % was due to the fact that the sample size for individual players was not large enough. This would be shown by the convergence of HBP % as plate appearances increased.

I'll spare you the statistical details, but a convergence like the one we were looking for would show up as a negative association here between the residuals and total plate appearances. That is, the graph above would show a decreasing pattern, but instead there is no statistically significant relationship between the residuals and plate appearances. Thus, the variation in HBP % is not the result of a sample size that is too small.

That leaves us with seemingly one conclusion: getting hit by a pitch is the result of skill, whether it is the inability to hit the inside pitch, a willingness to take one for the team, or even a personality that rubs the pitcher the wrong way. However, despite the strong evidence in support of this conclusion, there are other factors that have not been controlled for, such as the quality of other hitters on the team or pitchers that are generally faced with the uneven schedule. Nonetheless, all the evidence points to the fact that there is something that a hitter controls about getting plunked, and thus the statistic belongs as a component of on base percentage.

If you're reading this on Facebook, it was automatically imported from my blog.


  1. That means I had a lot of skill back in little league softball and you can't even discount it as part of my stellar OBP... (No, I don't remember the numbers but that doesn't mean I don't remember it was amazing.)

  2. That's totally different. The ball is much bigger in softball, meaning the skill is probably much less repeatable.

    And I also don't understand Eckstein's hit-by-pitch ability. He's so diminuitive, and you're only do the other team a favor by possibly injuring him; I'd probably want to bean him a lot more if he were on my team.

    In all seriousness, it's pretty clear that there is a reason some guys are hit a lot and some guys aren't hit at all, but it's tough to discern what that is. The most likely of the one you mentioned seems to be an inability to hit inside pitches, which causes the pitcher to throw more in that area. The "nasty guy" idea, while a humorous observation, would be a very unlikely reason for such an occurrence (though A.J. Pierzynski, the poster boy for such players, has a very high HBP percentage of 1.89).

  3. Oh, and I thought of another viable reason for some players being hit more: they may crowd the plate, giving the pitchers much less room for error when they go for the inside part of the plate.


Let us hear your thoughts!