My computer program implementing the Mills approach needs digitized play-by-play data of the games and so I would like to give a huge thanks to David W. Smith of Retrosheet for his tremendous interest and generousity. Both of which were needed and greatly appreciated by me. And so I would like to officially say: ================================================================================== The play-by-play information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".Retrosheet
================================================================================== EXPLANATION When I was a 21-year old senior(math major) at MIT in the spring of 1970 I saw a small ad in the back of the Sporting News and soon bought a little paperback book entitled "Player Win Averages" by brothers Harlan Mills and Eldon Mills. I instantly could see that they had gotten to the very heart of the matter as to rating batters and pitchers. And so it's always been in me since then to implement the concepts. I wrote my first version of baseball simulation on the computer in the spring of 1980. And of course have fiddled with it over the years to make it better. The other two books which have inspired me in this vein were Earnshaw Cook's two books: "Percentage Baseball"(1964) and "Percentage Baseball and the Computer"(1971) And finally, the empirical research into probabilties of winning from different situations by the Royal Canadian Air Force officers George Lindsey and his father Charles Lindsey in the 1950s and early 1960s. The Mills' concept is brilliant and elegantly simple: at each state in baseball game, each team has a certain probability of winning. And then a play occurs and there is a new state and associated probability of winning for each team. By comparing the BEFORE and AFTER probabilities and assigning the credit for the CHANGE to the batters and pitchers, and doing this process for a whole season one can determine by how much each batter and pitcher has helped his team win(and lose!) baseball games. For example, in 1951, in the third and final game of the National League Playoffs between the Brooklyn Dodgers and New York Giants, it was the bottom of the 9th and the Giants were trailing 4-2, with men on 2nd and third, and one out. Thus as Bobby Thomson stepped into the batters box, the chances were: (based on 1951 Major League Composite Play - to determine these probabilities, one must use the technique of computer simulation to have the computer play literally hundreds of millions of innings for all possible situations, using the major league statistics for the given year.) And then to rate the players, one needs to connect those situation probabilities with a digitized play-by-play of all the games of a given season. _________________________________________________________________________________________ 9 bottom HOME team MARGIN= -2 1 out(s) 2nd and 3rd balls= 0 strikes= 0 HOME WIN CHANCE= 0.27759 AWAY WIN CHANCE= 0.72241 HOME FIELD GAME STATUS= -445 = 2000*HOME WIN CHANCE - 1000 HOME FIELD _________________________________________________________________________________________ and after Bobby Thomson hit his fabled homerun ... THE GIANTS WIN THE PENNANT! THE GIANTS WIN THE PENNANT! the situation was as follows: _________________________________________________________________________________________ 9 bottom HOME team MARGIN= 1 1 out(s) none on balls= 0 strikes= 0 HOME WIN CHANCE= 1.00000 AWAY WIN CHANCE= 0.00000 HOME FIELD GAME STATUS= 1000 = 2000*HOME WIN CHANCE - 1000 HOME FIELD _________________________________________________________________________________________ BEFORE: GAME STATUS= -445 AFTER: GAME STATUS= +1000 CHANGE: 1000 - (-445) = +1445 for Bobby Thomson, -1445 for Ralph Branca Thus the Giants chances changed from .27759 to 1.00000 or a jump of .72241 = 72.241%. The Mills brothers liked to use a -1000 to +1000 scale and so that is shown as a GAME STATUS change from -445 to +1000 or a gain of 1445 points which is precisely 2000 * .72241 rounded off to the nearest integer. So in this case, Bobby Thomson would be credited with 1445 WIN points and Ralph Branca (the pitcher) would be credited with 1445 LOSS points. Over the course of a season each batter and pitcher will accumulate WIN and LOSS points from every play he was involved in. This also applies to a runner STEALING a base or being CAUGHT STEALING for example. You simply compare the BEFORE and AFTER game statuses to determine the WIN and LOSS points to give out. And the Mills PWA(Player Win Average) is simply the final WIN Points/(WIN Points + LOSS Points) The scale factor of 2000 used is totally arbitrary and has no effect on the PWA because it appears equally in the numerator and denominator and cancels out. WIN Points = probability change * SCALE LOSS Points= probability change * SCALE and so you can see that SCALE cancels out. By the way, in their book, Harlan & Eldon Mills calculate a change of 1472 points. If I were to have used the 1969 Major League Composite data - which is the database they used - I would have gotten 1474 points. The difference between my 1445 and the 1472 and 1474 figures is due to the fact that scoring was a higher rate in 1951 than in 1969; 4.57 runs per 9 innings per team in 1951, 4.09 runs per 9 innings per team in 1969. Thus it was slightly less impressive for Thomson to have hit that homerun in 1951 than it would have been in 1969 given the situation he was batting in.
And to quote from pages 25-26 of the Mills' book regarding the PWA: "Here's something to keep in mind, and it also explains why we think this measurement system is equitable for the players. The players are not measured against any arbitrary standard. They are measured against their own teammates and opponents on how they performed this year. Over the year, using our new scorecard, we tabulate every play of every game. We know what actually happened - how many times each situation moved to each next situation. This gives us an average of what will happen on each next play, as actually performed by the players. So when we score each player against that average, we are really scoring him against his fellow players and opponents. The player who conforms to the average will have exactly the same number of Win and Loss Points (Net Points=0, js note), for a .500 Player Win Average. Those who are better than average will be above .500, and those who are less than average will be below .500, no matter what their batting average or earned run average may be. To illustrate, if it were a common, every-day occurrence for a player to hit a game-winning home run in the ninth, then those who did not would be below average. Since this is not the case, those who do not are not necessarily below average. Also, in a year when the hitters are big, and ten runs a game are commonplace, a player had better be up there getting his share, or he'll be below average. On the other hand, in a year like 1968, an average hitter needn't have done so much, since low scoring games were the rule. In other words, we do not measure players from one era against players from another. We measure them against their own teammates and opponents. But the statistic itself - Player Win Average - can be used to compare players of any era. That's because, in any era, whether the ball be dead or rabbit-like, a .500 ball player will be average, and a .570 player will be much better than average."
For the seasons of 1957-1973, there are some games missing. But for 1974-2006 the data should be complete. Just scroll to the end of a given file and you'll see the total number of games in the database. You'll find my nomenclature in the files below to be quite similar to the Mills book. That is my way of honoring Harlan Mills and Eldon Mills for their evolutionary and revolutionary book. They were way ahead of their time and it's about time they got their due. They were my inspiration to pursue this topic. - Jeff Sagarin