Untitled
Milestones

There are certain milestones in the baseball universe that can ice a player’s immortality.  Amongst these milestones for hitters are 500 home runs and 3000 hits.  The key milestone for pitchers is 300 wins, and to a lesser extent 3000 strikeouts.  Year in and year out fans closely monitor their favorite player’s progression to these hallowed milestones and guess if he will ever make it.  As a player nears a milestone the world takes notice to such an extreme that every at bat is televised on ESPN.  This post is devoted to determing the probability that players will reach a certain milestone.  Before going further, I have to acknowledge that a much smarter man than me came up with these calculations.  This particular method is one in which Bill James entitles “career assesments.”  However, the numbers presented represent my own work utilizing James’ method.  Also of note is that for this exercise I am assuming that the 2009 season is complete.  Meaning that the probabilities shown will be slightly lower than what they will be when the season ends at the end of the month. 

I am going to start with who I consider to be the best hitter in the game.  A book can be written about the contributions that Albert Pujols has made to the baseball community.  Over the next few years there is the potential that Alex Rodriguez will surpass Barry Bonds’ home run record of 762.  To be exact his chances are 27.4%.  However, the projected amount of home runs that Alex Rodriguez will hit in his career is 721, which would be good for third all time.  Therefore we will assume that the record of 762 is still standing if Pujols approaches that many.  So, what are Pujols’ chances of setting the new home run record?  According to James’ formula, they are 12.5%.  It pains me to say, that Barry Bonds’ record just might stand a little longer than I want it to.

Below I going to post a few more career assessment numbers that I ran and found interesting, and then I will share the method with you so that you can try some out for yourself.

Rickey Henderson’s career stolen base record looks safe.  The two most legitimate competitors to the throne that I could think of are Carl Crawford and Jose Reyes.  Both of them came up with a zero percent chance of surpassing Henderson.  Yes, that is how good Henderson was.

It seems likely that we will see the fourth player to reach the 2000 RBI plateau.  Currently, Alex Rodriguez has a 83.5% chance and Manny Ramirez has a 37.9% chance.

Derek Jeter is all but a lock for 3000 hits with a 95.3% chance.  Vladimir Guerrero will need to pick it up a little bit with only a 16.2% chance.

C.C. Sabathia is probably the most likely to 300 wins at first glance and his chances are 18.1%.

The two best closers ever were both part of this generation in Mariano Rivera and Trevor Hoffman.  Hoffman holds the record for career saves, but what are the chances that Rivera surpasses him?  According to the formulas Hoffman will finish his career with 599 saves.  Rivera needs 80 saves to beat that mark, and the formulas say that he has a 19.4% chance of making that happen.

So now hopefully you are adequately curious to find out where these numbers came from.  I will work you through the method to show you how this is done.  For this example I will calculate Mark Reynold’s chances of beating Reggie Jackson’s mark of 2597 strikeouts.

Step One:  Calculate how many he needs to do so.  Currently Reynolds has 513 career strikeouts.  2598 - 513 = 2085

Step Two:  Estimate the years remaining in Reynold’s career.  Use the age that Reynold’s was as of June 30 of the current year (in this case 2009 and he was 25).  The estimated time remaining in a players career is (42 - age) / 2.  (42 - 25) / 2 = 8.5

Step Three:  Use a weighted average of the last three years to determine at what level he is playing at.  Weight year one as one, year two as two, and the most current year as three.  Divide this result by 6.  Reynolds has 180 strikeouts this year, 204 last year, and 129 the year before.  (180*3 + 204*2 + 129) / 6 = 179.5  If the result for this number is less the 75% of the most recent year, then use 75% of the most recent year.

Step Four:  Multiply the results from steps two and three together.  This is the projected amount of whatever remaining in the player’s career.  8.5 * 179.5 = 1525.75

Step Five:  Divide step four by step one and subtract .5.  (1525.75 / 2085) - .5 = .2318 OR 23.18% chance.  Note if this number is less than zero, then your answer is zero.  It is possible for this number to be greater than one.  In that case use this formula:  .97 ^ (step one result / step three result)

There you have it.  Not very difficult and an excellent basis for discussion.  The only reserve that I have with this formula is that I think it underestimates the amount of service time that a player has left in him.  I suspect, but do not know, that players that make the pros and play for a year or two but don’t stick bring the average career length down.  If I could devote a full time job to baseball statistics I would try to come up with a conditional probability expanding the length of a player’s career given that they have already played five seasons in the majors.  Alas, there are flaws in everything, but until you create a better system this is what we have to work with.

AL MVP Race

Every year, some of the most interesting baseball discussions concern who is going to win the MVP in each league.  Today I set out with a goal to create a methodology to help determine who should the MVP.  Since I am an AL kind of guy, this article will focus on the 2009 American League MVP through the end of August.  Without further ado, the methodology:

1.  To begin with, I decided to take a look at the 100 with the most at bats in the American League.  For each of these 100 players I obtained several basic stats.

2. For each of the top 100 at bat players, I calculated OPS+.  This is a very handy stat for comparing players to each other and the leagues that they play in.  The formula for OPS+ is:

100*[(OBP / lgOBP) + (SLG / lgSLG) - 1]

If a player has a score equal to 100 that means that he has a league average OPS.  Anything above 100 indicates a better than average player, and below 100 indicates a below average player.  OPS+ is also a great way to compare players from different eras, because it shows how good a player is relative to the league that he played in.

NOTE:  Typically OPS+ will then be park adjusted.  While I agree with the concept of park factors, there are several flaws that leave less than enthralled with the stat.  Thus, I chose to not adjust for park factors.

3. Then I eliminated all players with an OPS+ that was less than 100.  This left me with 63 American League players.

4.  From this point, I calculated runs created for all remaining players.  In my opinion, runs created is one of the most telling stats, because it is a team’s ultimate goal to score as many runs as possible.  A player’s job is to create runs, whether by scoring them or hitting them in.  A player’s job is not to have a high on base percentage or to hit doubles.  I should also note that there are many different runs created formulas floating around.  For my analysis I chose Bill James’ 2002 version (His most recent version).  I should also not the I did not make the home run with men on base adjustment or the average with runners in scoring position adjustment.  This is due to lack of data.

5. To get the field down to a more reasonable size, I only considered the top 30 players in runs created.

6. Next, I calculated secondary average and isolated power for each player.  Secondary average is designed to reflect everything a player does outside of batting average.  To perform well in this measure, hitters need to hit for power, steal bases, and take walks.  Isolated power essentially takes singles out of slugging percentage.  This gives a better idea of raw power than total bases does by limiting the score of singles hitters.  For isolated power I chose to use the PECOTA equation rather than the standard equation.  The theory behind the PECOTA equation is that doubles and triples should be weighted the same, because triples are most likely attributed to extra speed rather than more power.  The equation for both of these measures is listed below.

Secondary average = (TB - H + BB + SB - CS) / AB

PECOTA Isolated Power = (2B + 3B + 3*HR) / AB

7. From there, I looked up wOBA, WPA, Clutch, and RAR for each of these players.  Most casual fans will not be familiar with these measures, so I will give a brief explanation of each.

wOBA - a linear weight formula presented as a rate statistic.  This measure is very similar to OBP.  wOBA values each potential outcome a hitter can have relative to other possible outcomes.  For instance a home run is weighted as slightly more than two times more valuable to a team than a single. 

WPA - Win probability added.  This stat shows how much a player contributes to the chances of a team winning.  It is situation specific, meaning that a home run in a tie ballgame is worth more in WPA than a home run in a ballgame where your team is down by six runs.  Here is a practical example borrowed from baseball-reference.com:

For example, in the top of the eighth, the visiting team might be down five with one out and runners on first and second. The batter then hits a home run to bring the visiting team to within two runs, still with one out, but now with no runners on base. Prior to the home run, the batting team had about a 3% chance of winning which improved to 10% following the home run. This change of 7% is credited to the batter and debited to the pitcher. Compute these for every play and every game from 1956 on and you have win probability added stats.

Clutch - I think that some measure of clutch play should be a part of the MVP vote.  The problem is that it is difficult to quantify and is marred by misguided perceptions.  The stat I used was obtained from fangraphs.com.  They define their number for clutch as “how much better or worse a player does in high leverage situations than he would have done in a context neutral environment.

RAR - This is a derivation from the concept of VORP (value over replacement player).  It estimates the amount of runs a particular player creates over the average replacement player. 

8. I ranked all 30 players from best to worst in RC, secondary average, ISO, wOBA, WPA, Clutch, and RAR.  The best player in each category got a 30, the next best got a 29, etc.

9. I summed each players rank.

10.  I ranked who my methodology has determined as the top 30 in the AL MVP race.  Below are the results.

—————————————————————————————————

1. Mark Teixeira

2. Joe Mauer

3. Ben Zobrist

4. Kevin Youkilis

5. Kendry Morales

6. Jason Bay

7. Johnny Damon

8. Miguel Cabera

9. Chone Figgins

10. Carlos Pena

11. Adam Lind

12. Derek Jeter

13. Justin Morneau

14. Michael Young

15. Jason Bartlett

16. Shin-Shoo Choo

17. Victor Martinez

18. Ichiro Suzuki

19. Ian Kinsler

20. Brian Roberts

21. Russell Branyan

22. Bobby Abreu

23. Curtis Granderson

24. Marco Scutaro

25. Carl Crawford

26. Aaron Hill

27. Nick Markakis

28. Denard Span

29. Dustin Pedroia

30. Robinson Cano

———————————————————————————————-

My thoughts:  I am extremely pleased with the way the results turned out.  To get an idea of how close the are, there are big dropoffs between Zobrist and Youkilis, Bay and Damon, Morneau and Young, Roberts and Branyan, and almost every pair after Scutaro.  Before this exercise, I was inclined to pick Joe Mauer as my MVP.  It’s still possible that I would vote for him despite this methodology due to the premium of playing in the catcher position.  The most surprising top ten placement was Johnny Damon.  I just didn’t expect him to be there.  The 3 through six spots were ranked exactly as I had them before.  One player that I had much higher was Carl Crawford.  Initially I thought that he was a top ten pick.  Overall, I am inclined to stick to my ranking system, with only two minor tweaks.  First, I would rank Mauer ahead of Teixeira, and second Morneau up a few slots to number 10.  With one month to go, though, anything can happen.

Your thoughts?