Footiebusiness Contributor Dave Laidig weighs in with his latest manipulation of Castol data. Dave’s statistical work is ground breaking stuff and represents the cutting edge of soccer analytics and stats based crunching. Read closely and drop Dave a line. Also, check Dave out at Par Stat, his innovative website looking at Points Above Replacement in soccer.
Castrol Index scores are represented as performance measures by Opta, the publisher of the data. Past analyses have shown that Castrol Index scores, positions, and playing time reflect league results (See Footiebusiness and A Beautiful Numbers Game). And while the scores appear to match up with the league results pretty well of over time (R squared in excess of .73 for each of the three seasons analyzed), the year to year consistency of the data has not been established. Consequently, this analysis addresses the year to year relationships between adjusted Castrol scores (Castrol scores adjusted to remove the “punishment” for lesser playing time).
The first step in comparing the 2011 and 2012 MLS Castrol Index is to convert the scores to a standardized scoring system. The Castrol Index changed scoring systems between 2011 and 2012; a 3.4 – 10 point scale became a 0 – 1000+ point scale. Similarly, although the adjusted scores narrowed the ranges, the adjusted Castrol scores (the key component of this analysis) were also different. Thus, the 2011 and 2012 scores are not directly comparable. Consequently, this analysis converts the reported scores to Zscores (by position), which then allows for year to year comparisons.
Next, several year over year relationships were examined; starting with 2012 playing time. In a mild surprise, the biggest predictor of 2012 minutes, was not 2012 salary (r = 0.129), or even 2011 adjusted Castrol scores (0.16), but was 2011 minutes (0.625). This may reflect that a manager’s comfort level plays a greater role in lineup decisions that otherwise expected. One might hope that teams play their best performers, but this is not borne out by the data. And to give managers a break, 2011 performance may not be related to 2012 game time for a variety of competitive or health reasons. Alternatively, a cynic may assume that a manager may put his most expensive players on the field. But again, the data does not support this idea.
Turning one’s attention to 2012 performance, the best predictor (although mild) is 2011 performance (r = 0.343). 2011 playing time has no relationship with 2012 performance (0.01), and 2012 salary is not meaningfully related either (.18). While we know that with the DP system in MLS, salaries often reflect business value more than onthefield performance. Thus, a small salaryperformance correlation is not much of a shock. Here, the key concern is the disappointing relationship between the 2011 performance Zscore and the 2012 Zscore. A quick rule of thumb would predict that this relationship would explain about 10% of the 2012 performance. It’s not zero, but only a small benefit, and would likely be captured by other performance evaluation standards.
But one can look at the situation from a larger perspective. Instead of using the 2011 scores to predict a 2012 score, one may use the 2011 score to classify players into a couple groups, and then determine if this classification helps determine which players will turn in desirable or undesirable performances. For the statistically minded, this would convert a continuous variable into a discreet variable. And taking Nate Silver’s admonition to think probabilistically to heart, one can determine whether the 2011 performance data improves the chances of predicting a good 2012 performance.
Here, of the 2012 MLS field players (i.e., excluding goalkeepers), 46.6% had above average adjusted Castrol Scores, and 53.3% had below average performance scores. In other words, if one had no other information, and randomly picked players, they would select a desirable performance (i.e., an above average adjusted Castrol Score) about 46.6% percent of the time. We can call this the hit rate. Any data we can use to improve this hit rate may be of value in personnel evaluation and selection.
Using the adjusted Castrol scores, we can classify the players into two groups based on their 2011 performance. Roughly speaking, we will call them the good 2011 performers, and poor 2011 performers. Here, the good 2011 performers are defined as those players with Zscores at or above 0.5, and the poor performers are those players with Zscores at or below 0.5. Under a normal distribution, this corresponds with the top and bottom 30% of scores. However, because of turnover in the MLS player pool, these cutoffs correspond with the top 28% and the bottom 33% of the players with both 2011 and 2012 scores.
2011 performance 
2012 performance 
All Field Players 
Difference 
(% above avg) 
(% above avg) 


Top 28%(Z= +0.5) 
60.8% 
46.6% 
14.2%* 
(% below avg) 
(% below avg) 

Bottom 33%(Z= 0.5) 
68.8% 
53.3% 
15.5%* 
*Statistically significant; α ≤ .05, twotailed
Top 2011 performers had a good 2012 performance over 60% of the time, significantly better than the population as a whole (46.6%). Looking at the other direction, bottom 2011 performers had a poor 2012 performance over 68% of the time, significantly worse than the population as a whole (53.3%). Thus, the 2011 rating is associated with 2012 performance. While a good or poor 2011 score does not match up with 2012 performance 100%, nor does it help differentiate between players within the 2011 performance group, knowing the previous performance level increases the chances of getting the decision right. And with limited roster space, and limited financial resources, relatively small advantages can lead to meaningful advantages.
Another way to look at the data would be to line up the three Zscore groups with the probability of an aboveaverage performance the following year. Those players with a 2011 score 72^{nd} percentile and above, between 72^{nd} and 33^{rd} percentile, and those below 33^{rd} percentile had a probability of a good 2012 performance of 60.8%, 46.6%, and 31.2% respectively.
2011 Performance  Probability ofgood 2012 performance 
Z > .05(72^{nd} percentile and above)  60.8% 
0.5 > Z > 0.5(between 33^{rd} and 72^{nd} percentile)  46.6% (default %) 
Z < 0.5(33^{rd} percentile and below)  31.2% 
By broadly categorizing the 2011 performance level, one can determine the probability of a good performance the next year.
For one last comparison, we can compare the predictive value of 2011 performance versus player 2012 salaries (which are negotiated and signed prior to taking the field for the 2012 season). And there is a common theme that spending on players dictates success on the field. However, 2012 salary does not predict 2012 performance as well as adjusted Castrol Index scores from the previous season. A top salary is associated with a good 2012 performance 55.7% of the time, which is 9% above chance. And a bottom salary is associated with a poor 2012 performance about 55% of the time, only 2% different than chance. Both of these values are not statistically significant.
2012 salary 
2012 performance 
All Field Players 
Difference 

(% above avg) 
(% above avg) 

Top 28% 
55.7% 
46.6% 
9.1%** 
(% below avg) 
(% below avg) 

Bottom 33% 
55.8% 
53.3% 
2.5%** 
**Not statistically significant; α ≥.05, twotailed
All in all, the MLS data suggests that knowledge of the previous year’s adjusted Castrol Scores can significantly increase the probability of a hit (i.e., selecting a field player who will turn in a good performance) as well as significantly reduce the chances of a miss. Further, the adjusted Castrol scores are better predictors than player salaries. It remains possible that additional data – and larger sample sizes – will allow for refined probabilities. In the meantime, one can layer in an objective measure into their player performance predictions.
Filed under: Uncategorized 
This is great. I wonder how many teams actually pay attention to this type of data. Sports people tend to trust their eye snd gut more than the numbers. Do we know if any teams rely heavily upon these numbers?
If I recall correctly Bill James distributed his initial work just took to just a few diehards. Dave you should become the soccer version of Bill James
There are a growing number of teams that pay attention to stuff like this. However, the form it takes varies widely depending on the organization.
The “traditional” role (if anything quantitative really qualifies as traditional) is to apply quantitative analyses to tactical questions – using a quant as an extra coach. In this role, the analysts are expected to break down video, assess players, and perhaps develop gameplans for opponents. As you can imagine, a coaching background comes in handy for this stuff.
The “Moneyball” role applies an approach to an economist, using quantitative methods to determine market value (which is an indirect measure of winning). And this role is more likely to me attached to the front office
There is some overlap between the two, but when you go to a team preaching empirical analysis, their expectations may vary greatly. And communication is very important. Teams will consider it a waste of money if you tell them something they already believe is true. But if you propose an “unconventional” action, there is some friction as well. However, I believe that in a field surrounded by athletes (current or former), the competitive nature will ultimately lead them to seeking an “edge” wherever they can find it. And quantitative analysis won’t replace technical skill, or good coaching and scouting; but it can provide an edge.
In MLS, I am aware of 4 or 5 teams that have a dedicated analyst. And I hope to see many more in the future. In the EPL, there are more teams known for having quant shops. And I believe this is due to the larger salaries. With more money at stake, protecting the investment with an extra employee or two begins to make sense. As MLS salaries rise, I expect to see more of this kind of analysis.