Beginning an Objective Analysis of MLS Player Performance

In Part I of a new series using numbers and detailed analysis to evaluate players in MLS, Footiebusiness Contributor Dave Laidig has re-imagined and re-configured the data to provide an exceptionally accurate mechanism for comparing talent.  For those interested in player evaluation, soccer statistics and more, the below is a great and necessary read.

We play and watch soccer because 90 minutes of live competition is
unpredictable and infused with drama.  The business of soccer involves
assembling the best players to take on this inherent unpredictability
within limited resources.  Video games, and bloggers, easily identify
which players are better.  But real life is not that clear cut.  It is
not unusual for a player to be deemed unnecessary, released, and then
become successful with another team (e.g., Jay Demerit or Hercules
Gomez).  Even with ample time to review a player in training, it can
be difficult to know what you have.  Teams spend a great deal of
resources in an effort to evaluate the information that video games
readily offer; to identify which player is better.  An analysis of
information freely available may help this goal.

For example, MLS recently released a quantitative summary of
performance, the Castrol Index, for over 450 MLS players with league
minutes in 2011.  The Castrol Index Score is an Opta statistic
representing a summation of a player’s overall contribution to wins.
This is reported on a scale of 3.89 to (presumably) 10 – although the
2012 weekly ratings use different point system.  I assume the Castrol
Index rating is the result of a structural equation model,
incorporating many types of game events and relating them to winning
games.  While the exact formula for creating the statistic is
proprietary information of Opta, the important thing to know is that
the number is intended to represent overall performance (and not just
one aspect of the game like goals scored, or tackles, etc.) and should
allow for comparisons between different players.   And while access to
the formula would help, we can still determine whether using the Index
can help a team manage its scarce resources.

A review of the Castrol Index scores for 2011 shows us that Chris
Wondolowski earned the highest rating for his work last season, with
an average rating of 9.31 for 2,672 minutes.  And I think many of us
would not be able to argue with giving Wondo the top spot.  We might
say this statistic looks right – and a few grand in school loans
later, a statistician would say it has “facial validity.”  Rounding
out the top ten we have Alvaro Fernandez (9.26 pts.), Thierry Henry
(9.21 pts.), Omar Gonzalez (9.17 pts.), Aurélien Collin (9.12 pts.),
Chance Myers (9.09 pts.), Drew Moor (9.05 pts.), George John (9.02
pts.), Bobby Boswell (8.98 pts.), and Jamison Olave (8.95 pts.).

Under the Castrol Index:
The top Forward is T. Henry (9.21 pts.)
The top Midfielder is Wondolowski (9.31 pts.)
The top Defender is Omar Gonzalez (9.17 pts.)
The top Goalkeeper is Dan Kennedy (7.62 pts.)

The distribution of the Index scores, however, reveals room for
improvement.  A player’s minutes in 2011 are highly correlated with
his score (a correlation of .90 – close to 1.0, the highest
correlation possible).  Further, looking at the 150 or so players with
more than 1800 minutes played (the “1800+” group), there is no
significant correlation (a correlation of .05 – close to zero, which
represents two variables that are entirely unrelated to each other).
For the group below 1800 minutes (the “<1800” group), the relationship
remains incredibly strong (correlation of .94).  Thus, it looks like
there is a threshold of playing time required before one can get an
unbiased evaluation of performance.

Consequently, I set about adjusting the <1800 group so that I could
compare these players with the unbiased group of 1800+ minutes.  To
achieve this goal, I created an expected value for the <1800 group
(with a single regression equation).  Basically, I wanted to know what
score would be expected for a given amount of playing time, and then I
could compare that player’s actual Castrol Index score with what would
be expected for their actual minutes.

Once I could calculate whether a player exceeded expectations, or not,
I then wanted to know by how much.  To evaluate this, I divided the
entire MLS population into ten roughly equal groups and examined how
the Index scores were spread around the average for each group.  For
each of the 1800+ groups, there was greater variance than the <1800
groups.  Thus, I increased the variance for the <1800 groups to mirror
the spread of the unbiased   1800+ group.  Finally, I took the
player’s relative performance in relation to their playing time and
put them in the spot of a player with 1800 minutes – the cutoff for
unbiased evaluation by the Castrol Index.  In short, the goal of my
adjustment process was to allow direct comparisons between the <1800
group and the 1800+ group.

All in all, the adjusted scores have a correlation with minutes of
.24; which compares favorably to the correlation of .90 for minutes
and the Castrol Index ratings.  The <1800 group has its correlation
with minutes reduced from .95 to about .35.  While this remaining
correlation is not zero, the small value may mean there is room to
improve the adjustment process, or it may simply reflect that better
players got some more playing time.  Further, the range of players’
scores is reduced, with the high scores remaining the same, but the
floor is raised from 3.89 to 6.35.  While, this adjustment does
condense the player ratings, it also reduces the noise attributed to
playing time alone.  And the adjustment process unsettles some of the
player rankings.

Under the adjusted index:
The top Forward is Danny Koevermans (9.24 pts.)
The top Midfielder is Wondolowski (9.31 pts.)
The top Defender is Omar Gonzalez (9.17 pts.)
The top Goalkeeper is Zac MacMath (8.18 pts.)

And when one turns to the individual players, the adjusted scores seem
to better reflect conventional opinion as well.  For example, Roy
Keane (a designated player for the Galaxy) had a Castrol Index score
of 5.34 over his 275 league minutes.  But his adjusted score is 8.51,
and his overall rank improves from 336th to 35th.  The Seattle forward
Steve Zakuani  has his Index score of 5.83 becomes an adjusted score
of 8.88, enough to move him from 284th to 14th overall; and become the
fourth ranked forward behind Danny Koevermans, Thierry Henry and Mike

And with some players moving up due to the adjustment process, some
marginal players see their rankings fall in comparison.  David Beckham
slides from 69th overall to 119th (21st to 36th among Mids).  Shalrie
Joseph (the Revs Midfielder) drops from 94th overall to 193rd,
highlighting what many perceive to have been an off-year for him.

And one should keep in mind that a good score on the index, or the
adjusted index, should be treated similarly to having a good year in
general (or a good 275 minutes).  One good year does not guarantee a
standout performance the next.  And a low rating may have an
explanation that soccer fans can readily identify with; such as an
injury, or playing out of position, or trouble acclimating to an east
coast club after being a captain for Gold Cup winner, all the while
blaming one’s teammates.  But with the caveats aside, I would rather
focus on players who have achieved something, even for a short period
of time, than bank on players who are only rated in terms of
potential.  Or, at least, I’d follow this strategy until the market
catches up and properly prices actual performance.

Using the adjusted formula, one can then locate players that have put
in an objectively superior performance, but may not be as recognized
due to a relative lack of playing time – a market arbitrage
opportunity.  With an objective measure of performance, and with the
influence of playing time reduced, one can begin to test the
contribution of various positions to the goal of winning games
(remember the index and the adjusted index scores should allow
meaningful comparisons between players of different positions); and
begin to make decisions about which positions offer the highest return
on investment.

We will continue with Part 2 of this series where we consider the
relationship of the objective performance ratings on league points,
which positions contribute most to wins, and whether the Castrol
Index, the adjusted index or player salaries best predicts a team’s

13 Responses

  1. These are some detailed numbers. I guess wondo is the best player no matter the measure

  2. […] They tell me statistical analysis is all the rage, so I pass along this link because I want to be cool, not because I understand any of it. Objectively analyzing MLS player performance. Variance, correlation, I just passed out. // Footiebusiness […]

  3. QWK KCKS: That’s how I feel when I get my daughters’ Christmas lists.

  4. when did “Roy” Keane sign with LA? or was that robbie?

  5. Erik, you are correct. Although another game like that against the Revs and he will change it to Roy.

  6. Quick questions: Why do you list Wondo as a midfielder? Clearly plays striker all the time and has for several seasons now.

  7. I used the positions as they are listed by the MLS Players Union in the 2011 salary table. I was not in a position to impose my judgment on the positions (and wouldn’t unless I could address every player equally). And Wondo was listed as a Mid by the union. That may change for next year. I could easily imagine a player (or an agent) wanting more money to play forward instead of a Mid. Not saying that makes sense, just people might make that argument.

    For positions with dual listings, I used the first positions identified for each player (thus, “M-F” is a midfielder and a “F-M” is a forward). Allowing a player to be counted at more than one position, the position averages go up slightly (e.g., 7.49 v. 7.48 for Mids); probably indicating there is a handful of better players who can handle more than one spot on the field.

    For the analyses used in Part 2, I forced players into their primary-listed positions. Otherwise, I would have gotten thrown off on the allocation of playing time among the positions. Considering the averages among the positions (and between the averages and medians) have the same relationship for single positions and dual-listed positions, I believe the framework still has some utility.

  8. […] Ben Berger: Beginning an Objective Analysis of MLS Player Performance […]

  9. […] Posts Monday AfterLooking at MLS AttendanceBeginning an Objective Analysis of MLS Player PerformanceAbout the […]

  10. […] Posts How is NBC Doing?Beginning an Objective Analysis of MLS Player PerformanceLooking at Major League Soccer Television RatingsMLS Official […]

  11. […] part of his series looking at the use of statistics and numbers in soccer.  For Part I, click here. For Part II, click here.  So far, this series has focused on analyzing objective measures of […]

  12. […] Laidig: Money & Performance in MLS – Part 1 – Part 2 – Part […]

  13. […] Dave Laidig is back with the second part of his series looking at the use of statistics and numbers in soccer.  For Part I, click here. […]

Comments are closed.