Dave Laidig is back with the second part of his series looking at the use of statistics and numbers in soccer. For Part I, click here.
In part 1, we reviewed Opta’s Castrol Index ratings of MLS players and
an adjusted index minimizing statistical anomalies. In this part, we
use the objective performance data – i.e., the Castrol Index and the
adjusted ratings – to analyze the relative contribution of the
different positions, and to serve as the basis of determining the
value of each position.
The beginning of Part 2 relies heavily on the work of Benjamin
Leinwand and Chris Anderson, available at
http://www.soccerbythenumbers.com. In short, I attempted to replicate
their work, and then extend the new areas. I am indebted to their
efforts, and appreciate their willingness to share their results with
the public – and to start a conversation that consumes large chunks of
my free time.
Using both the Castrol Index and my adjusted scores, I set about
replicating the Leinwand- Anderson analysis. Their analysis created a
regression equation using a team’s average Castrol Index rating for
each position and the team’s league points (i.e., a mathematical model
roughly summarized as: a Constant + Fwd Rating + Mid Rating + Def
Rating + GK Rating = Expected League Points).
The Leinwand- Anderson analysis reported Defenders have higher average
Castrol Index scores. And the Defender position was the only one
significantly related to league points in a multiple regression
equation; an equation including a team’s average Forward, Midfielder,
Defender, and GK ratings (which are 7.07, 7.09, 7.59, and 6.53
respectively). Considering forwards are paid more than Defenders,
this suggests that investing in defenders may be a more productive use
of resources. Using publically available data, I was able to recreate
their result, although some of my interim calculations were a touch
different. Replicating their process as best I could, I obtained an
R-squared of .62; meaning this model roughly explains about 62% of the
variation in a team’s league points. Perhaps more importantly, and as
originally reported, only one position group resulted in a significant
relationship with the league table.
As stated in Part 1, I believe the Castrol Index can be improved,
especially among players with lesser playing time. Accordingly, I ran
my adjusted index scores through the same process as the Leinwand-
Anderson analysis. Notably, defenders still retained a higher average
rating over midfielders and forwards (7.89 versus 7.58 and 7.67
respectively), but the position averages became closer with using the
adjusted index. Using the adjusted ratings, the R-squared value was
.72; which adds 10% over the model using the original Castrol Index
ratings. Consequently, the adjusted ratings were a better predictor
of team performance.
Next, I attempted to address a unique soccer feature of the regression
equation. The Leinwand- Anderson equation treats Forwards,
Midfielders and Defenders as equal units. But we know, depending on
the formation and game situation, there are uneven numbers at each
position. Thus, I modified the equation to account for the relative
time contribution of each position. To do this, I used average
position score from the first analysis, and weighted it by that
position’s contribution to the overall team minutes. For a math
illustration, the LA Galaxy forwards may have an adjusted index
average of 7.17, and contribute 20% of the team’s minutes. I would
then report .2 * 7.17 or 1.43 as a value of that position’s point
contribution to the team.
By considering playing time with each position, the multiple
regression equation for the position’s adjusted point contribution was
significant for Forwards, Midfielders, and Defenders, with a
non-significant p-value of .14 for Goalkeepers. Thus, this model
permits analysis of all field positions, not just defenders. In this
model, defenders have a larger regression coefficient than
midfielders, which is larger than forwards. This would support the
notion that defenders contribute more to wins than other positions.
Further, the R-squared value was .78; which, by explaining 78% of the
variance in league minutes, represents the model with the greatest
explanatory power (compared to .62 for the Leinwand- Anderson model
and .72 for the Leinwand- Anderson using adjusted index ratings).
Although this mathematical model shows each position’s contribution to
league points, the model is clumsy for team use. Contracts are
determined player by player, and not position group by position group.
Thus, in order to make the equation useful, the regression equation
coefficients need to reflect the value of one player. Consequently, I
considered league data and determined forwards accounted for 19.0% of
all league minutes, midfielders 39.5%, defenders, 32.2% and
goalkeepers 9.1%. With eleven players, I calculated how many
“players” were assigned to a particular position group. Using
forwards for example, this position had 2.1 players’ worth of league
minutes applied to this position’s regression coefficient.
As a result, the estimated impact of inserting a field player with a
higher adjusted index score on the team’s league points can be
calculated. Recalling our model, the defender position group had the
highest contribution to league points. But when comparing a single
player to a single player, the value of a forward was greater than
that of a defender. This result is due in large part to the defender
group consisting of more players than forwards, thus the defender’s
contribution is diluted by the extra players incorporated into its
To illustrate, I went back to my adjusted index scores to provide some
examples for each position.
Upgrading from a median forward to an 80th percentile forward
(represented by Zusi) would expect to yield an additional 7.98 league
points. Upgrading from a median defender to an 80th percentile
defender (AJ DeLaGarza) would expect to yield an additional 4.68
points. And upgrading from a median midfielder to an 80th percentile
midfielder (Zach Lloyd) would expect to yield an additional 3.03
And this model can be adapted to a team’s current system of player
valuation (instead of using the adjusted index or Castrol Index) by
using the assumption that player distributions are similar. And while
the player rating distributions may not exactly replicate the adjusted
index, this method provides structure for roster management decisions.
Upon review, one is justified in saying defenders contribute more to
wins. With a closer inspection, we see less variability among
defenders compared to other positions. And on a player by player
basis one can justify paying more for a forward upgrade because of
greater expected results. In sum, forwards may be more valuable for
their combination of rating and scarcity.
However, this analysis alone does not justify current expenditures.
In Part 3, we will consider how salary affects team success and
whether salary is associated with player performance.