Monday 9 January 2012

Everything you always wanted to know about: SIERA

The world of advanced baseball statistics can be an intimidating place for those of us who slept our way through advanced algebra or haven't been a follower of the Bill James revolution from the beginning.

 

Still, that doesn't mean that we should feel left out when it comes to another way of understanding and appreciating the game we all love. With that in mind, BLS stat doctor Alex Remington will explore a new advanced statistic each week during the offseason, as he did last year, providing a simple primer for the uninitiated.

Today's statistic: SIERA

What it stands for: Skill-Interactive Earned Run Average, developed by Matt Swartz and Eric Seidman (first at Baseball Prospectus, and then later revised by the same authors at Fangraphs).

How to calculate SIERA: You're going to need a lot more than a calculator for this one, so it's better to explain it conceptually. SIERA is an estimator rather than a true run average. It's based on an analysis of a pitcher's components, their strikeout rate, walk rate, and ground ball rate, rather than an analysis of how many runs were given up, and as such it contains a lot of the same insights as FIP, which I wrote about last year. Those two main insights, which have been shaking up baseball statistics for the past decade:

• Pitchers cannot control their Batting Average on Balls in Play (BABIP)
• Pitchers cannot control their Home Run per Flyball rate (HR/FB)

BABIP and HR/FB have become the two statistics used as shorthand for "luck" — when a pitcher has a low BABIP and HR/FB, we generally assume that he has simply been benefiting from "good luck," which is unlikely to continue, and when he has a high BABIP and HR/FB, we generally assume that he has simply been victimized by "bad luck," which is likely to turn around. These are two of the central assumptions I use when writing my Slumpbot .200 and Streaking! columns during the regular season, analyzing streaking and slumping players.

But those assumptions aren't completely 100 percent true, as Swartz and Seidman found while developing SIERA. In fact, high-strikeout pitchers tend to have a lower BABIP and HR/FB rate than low-strikeout pitchers. So SIERA essentially tries to compare apples with apples, by comparing pitchers with high strikeouts and high walks to pitchers with low strikeouts and low walks. It also adjusts for park effect, so the analysis is truly neutral.

What SIERA is good for: As I mentioned above, SIERA looks like ERA, but it reflects a very different underlying analysis. And comparing the two can reveal possible discrepancies between what they did and what they're likely to do in the future. That's because ERA measures what happened, while SIERA tries to measure what the pitcher individually did.

For example, last year, Zack Greinke led the major leagues with a 2.66 SIERA, because he struck out a ton of guys and didn't walk many people, even though he had a 3.83 ERA. In my opinion, the discrepancy was a product of bad luck. Though luck may not explain why he got lit up in the playoffs.

The major-league ERA leader was Clayton Kershaw at 2.28. However, his SIERA was 2.81. That's still terrific, good for fourth in the majors, but it may be a sign that he won't have a 2.28 ERA again next year. (Hey, there's no shame if he doesn't. No starting pitcher has put up two seasons in a row with an ERA of 2.28 or below since Pedro Martinez in 2002-2003.)

Swartz summarized their statistical findings as he and Seidman developed and tweaked SIERA in a series of bullet points. (It was part of a five-part series that Swartz wrote on Fangraphs to explain the stat.) Basically, the fundamental insight is: Skills matter.

Pitchers who are good at striking guys out are so good at getting zero contact that when batters do put wood on the ball, the contact is often weak at best. Therefore, they tend to have a lower BABIP and HR/FB, and they tend to get more double plays.

Relief pitchers also tend to have a lower BABIP and HR/FB: that's because they don't have to worry about pacing themselves to pitch a lot of innings, so they pitch at maximum effort and therefore also tend to get weaker contact.

Baserunners tend to cluster: the more baserunners you have, the more of them are likely to score. Therefore, pitchers who are good at not walking many people are not as hurt by the walks they give up as pitchers who yield a lot of free passes. (That said, a pitcher that gives up a lot of walks and singles will be in a lot more double play situations, and is therefore comparatively more likely to get a double play.)

You can apply similar logic to ground-ball and fly-ball pitchers. Of the three main batted-ball types — fly balls, ground balls, and line drives — fly balls are least likely to turn into hits, ground balls are slightly more likely, and line drives are much more likely. Of course, of the three, fly balls are the only ones that turn into home runs. (Some homers are hit on a line, but they're still classified as fly balls.) Therefore, a pitcher is more likely to give up a hit on a ground ball than on a fly ball, but infinitely more likely to give up a home run on a fly ball.

Because ground-ball pitchers are so good at giving up ground balls, their ground balls are easier to field than the average grounder. And because fly ball pitchers are good at giving up balls that die in the air, writes Swartz, "Pitchers who have higher fly ball rates allow fewer home runs per fly ball."

These insights are interesting on their own, as a way of understanding the way that pitching skills are interrelated. They also help point the way toward a better way of judging a pitcher's innate talent. That's what a pitcher's SIERA is, and that's how it complements their regular ERA: SIERA is a measure of how good they are, not just what wound up happening.

When SIERA doesn't work: Because it relies on a lot of complicated statistical comparisons, SIERA breaks down when the number of innings is very small. So it isn't used to analyze pitchers with fewer than 40 innings pitched.

Others, such as Colin Wyers at Baseball Prospectus (where SIERA was originally developed), have asked a more basic question: Why do we need such a complicated stat? As it happens, most pitchers have a SIERA that is within about 0.10 of their xFIP, which is a fancy version of FIP that normalizes for a league-average home run rate. And as the product of a complicated series of regressions, SIERA is anything but back-of-the-envelope: If you don't have access to www.fangraphs.com, you'll probably have a hard time calculating it yourself.* So, ultimately, what's the point?

* In his article, Wyers did some complicated math and argued that SIERA's improvements over previous stats were illusory; Swartz did some complicated math to rebut Wyers' claims. The discussion is here.

Why we care about SIERA: What's the point? The point is that it does a good job of telling us not just who had a good year, but who pitched well. It's both a bit more accurate than xFIP and a bit more methodologically sound, taking batted balls into consideration while xFIP ignores them. It isn't the perfect stat, obviously. Nothing is. But it's produced a lot of interesting analytic insight, it's a good window into how good a pitcher really is, and it's one of the best stats we have — until someone develops the next one. That's how baseball stats have always worked.

Previous lessons: BABIPOPS+FIPwOBAWPAWARUZRJ-HOFFAWin Shares, ERA+/ERA-



No comments:

Post a Comment