Dave Laidig is back with the second part of his series looking at the use of statistics and numbers in soccer. For Part I, click here.

In part 1, we reviewed Opta’s Castrol Index ratings of MLS players and

an adjusted index minimizing statistical anomalies. In this part, we

use the objective performance data – i.e., the Castrol Index and the

adjusted ratings – to analyze the relative contribution of the

different positions, and to serve as the basis of determining the

value of each position.

The beginning of Part 2 relies heavily on the work of Benjamin

Leinwand and Chris Anderson, available at

http://www.soccerbythenumbers.com. In short, I attempted to replicate

their work, and then extend the new areas. I am indebted to their

efforts, and appreciate their willingness to share their results with

the public – and to start a conversation that consumes large chunks of

my free time.

Using both the Castrol Index and my adjusted scores, I set about

replicating the Leinwand- Anderson analysis. Their analysis created a

regression equation using a team’s average Castrol Index rating for

each position and the team’s league points (i.e., a mathematical model

roughly summarized as: a Constant + Fwd Rating + Mid Rating + Def

Rating + GK Rating = Expected League Points).

The Leinwand- Anderson analysis reported Defenders have higher average

Castrol Index scores. And the Defender position was the only one

significantly related to league points in a multiple regression

equation; an equation including a team’s average Forward, Midfielder,

Defender, and GK ratings (which are 7.07, 7.09, 7.59, and 6.53

respectively). Considering forwards are paid more than Defenders,

this suggests that investing in defenders may be a more productive use

of resources. Using publically available data, I was able to recreate

their result, although some of my interim calculations were a touch

different. Replicating their process as best I could, I obtained an

R-squared of .62; meaning this model roughly explains about 62% of the

variation in a team’s league points. Perhaps more importantly, and as

originally reported, only one position group resulted in a significant

relationship with the league table.

As stated in Part 1, I believe the Castrol Index can be improved,

especially among players with lesser playing time. Accordingly, I ran

my adjusted index scores through the same process as the Leinwand-

Anderson analysis. Notably, defenders still retained a higher average

rating over midfielders and forwards (7.89 versus 7.58 and 7.67

respectively), but the position averages became closer with using the

adjusted index. Using the adjusted ratings, the R-squared value was

.72; which adds 10% over the model using the original Castrol Index

ratings. Consequently, the adjusted ratings were a better predictor

of team performance.

Next, I attempted to address a unique soccer feature of the regression

equation. The Leinwand- Anderson equation treats Forwards,

Midfielders and Defenders as equal units. But we know, depending on

the formation and game situation, there are uneven numbers at each

position. Thus, I modified the equation to account for the relative

time contribution of each position. To do this, I used average

position score from the first analysis, and weighted it by that

position’s contribution to the overall team minutes. For a math

illustration, the LA Galaxy forwards may have an adjusted index

average of 7.17, and contribute 20% of the team’s minutes. I would

then report .2 * 7.17 or 1.43 as a value of that position’s point

contribution to the team.

By considering playing time with each position, the multiple

regression equation for the position’s adjusted point contribution was

significant for Forwards, Midfielders, and Defenders, with a

non-significant p-value of .14 for Goalkeepers. Thus, this model

permits analysis of all field positions, not just defenders. In this

model, defenders have a larger regression coefficient than

midfielders, which is larger than forwards. This would support the

notion that defenders contribute more to wins than other positions.

Further, the R-squared value was .78; which, by explaining 78% of the

variance in league minutes, represents the model with the greatest

explanatory power (compared to .62 for the Leinwand- Anderson model

and .72 for the Leinwand- Anderson using adjusted index ratings).

Although this mathematical model shows each position’s contribution to

league points, the model is clumsy for team use. Contracts are

determined player by player, and not position group by position group.

Thus, in order to make the equation useful, the regression equation

coefficients need to reflect the value of one player. Consequently, I

considered league data and determined forwards accounted for 19.0% of

all league minutes, midfielders 39.5%, defenders, 32.2% and

goalkeepers 9.1%. With eleven players, I calculated how many

“players” were assigned to a particular position group. Using

forwards for example, this position had 2.1 players’ worth of league

minutes applied to this position’s regression coefficient.

As a result, the estimated impact of inserting a field player with a

higher adjusted index score on the team’s league points can be

calculated. Recalling our model, the defender position group had the

highest contribution to league points. But when comparing a single

player to a single player, the value of a forward was greater than

that of a defender. This result is due in large part to the defender

group consisting of more players than forwards, thus the defender’s

contribution is diluted by the extra players incorporated into its

effect.

To illustrate, I went back to my adjusted index scores to provide some

examples for each position.

Upgrading from a median forward to an 80th percentile forward

(represented by Zusi) would expect to yield an additional 7.98 league

points. Upgrading from a median defender to an 80th percentile

defender (AJ DeLaGarza) would expect to yield an additional 4.68

points. And upgrading from a median midfielder to an 80th percentile

midfielder (Zach Lloyd) would expect to yield an additional 3.03

points.

And this model can be adapted to a team’s current system of player

valuation (instead of using the adjusted index or Castrol Index) by

using the assumption that player distributions are similar. And while

the player rating distributions may not exactly replicate the adjusted

index, this method provides structure for roster management decisions.

Upon review, one is justified in saying defenders contribute more to

wins. With a closer inspection, we see less variability among

defenders compared to other positions. And on a player by player

basis one can justify paying more for a forward upgrade because of

greater expected results. In sum, forwards may be more valuable for

their combination of rating and scarcity.

However, this analysis alone does not justify current expenditures.

In Part 3, we will consider how salary affects team success and

whether salary is associated with player performance.

Filed under: Uncategorized | 13 Comments »