Skip to content

Analysis

Analyzing Consistency of Athletes

Update (April 16th): Since originally posting I have added an age-weighted component to the numbers, so that newer results have a larger influence than older ones. As you can see from the changes in the numbers, this adds another interesting dimension to the numbers in this post.

I love getting feedback on my analysis and predictions – very often, they trigger some new, interesting way of looking at the data. For example, Linsey Corbin made the following remark to me:

I wish there was a way that your predictions could show consistency. One thing I pride myself on is being fairly consistent across the board.

Thanks for the suggestion, Linsey (and great to see you back to racing)! I have been looking at different ways of attacking this question, here is what I was able to come up with. I will continue to monitor these numbers for upcoming races, maybe and I’ll include them in future predictions.

Deviation

In statistics, there are a number of way to measure how “consistent” a set of data is. The most common way to express variability in data sets is the “Standard Deviation“. StdDev basically measures the distance of data points from the average value – the more “outliers” there are and the further off they are, the higher the standard deviation.

This was my first try of analyzing consistency. The data analysis part is pretty simple, as the function is built into all kinds of programs. However, the results were not very helpful: In essence it helped identify athletes that had one or more sub-standard results, e.g because of walking large parts of the marathon in a race. For example, Lucy Gossage showed up as an inconsistent athlete with a large deviation, but that was almost exclusively a result of her marathon walk resulting in an 11:32 finish in Kona 2014. It also didn’t value “good” results: The difference of a good result to an average – maybe 30 minutes or so – is much smaller than that of a bad result – walking easily adds an hour to the overall time.

Identifying Non-standard Results and Quantifying Consistency

Even when looking at the deviation of results of each athlete did not lead to a good measure, it formed the basis for another way of looking at the data. In the familiar “bell shape” curve of the normal distribution, 68% of results fall within one standard deviation around the average. When looking at the difference between an athlete’s “expected time” and their actual finishing time, roughly 68% of the results are within 20 minutes of the expected time. Based on this I classify results within 20 minutes of the expected finishing time as “normal”, and any result quicker as “better” results and anything slower and DNFs as “sub-par” results.

I can then aggregate all the results of an athlete into a figure like this:

Linsey Corbin: 83% +17% -0% (18)

Older results have less of a meaning than newer, so adding in an aging component gives the following numbers:

Linsey Corbin: 79% +21% -0% (18)

Each part has the following meaning:

  • Linsey Corbin: Name of the athlete
  • 79%: Fraction of normal race results
  • +21%: Fraction of “better than expected” race results
  • -0%: Fraction of “sub-par” race results (including DNFs)
    (Note: Technically, Linsey has at least one DNF in her Ironman races – she didn’t finish IM Texas in 2011. This is a limitation in my data – I have only been including DNF’s since 2014.)
  • (18): Total number of Ironman-distance results (including DNFs)
Average numbers are about 68% of normal results and roughly 15-20% each of better and sub-par results, but these numbers vary wildly between athletes.

Examples

Here are some more numbers from well known athletes – put into different groups. (As I have updated my algorithm a bit since posting for the first time, I am also including the originally posted numbers in [square brackets].)

Stable Athletes

  • Andy Potts: 100% +0% -0% (13) [originally posted: 100% +0% -0% (13)]
  • Yvonne Van Vlerken: 84% +0% -16% (23) [originally posted: 91% +0% -9% (23)]
  • Lucy Gossage: 92% +0% -8% (12) [originally posted: 91% +0% -9% (11)]
  • Sebastian Kienle: 85% +12% -3% (11) [originally posted: 82% +9% -9% (11)]
These are athletes where predictions are a very good indicator of how they’ll perform on race day – they usually perform on a very similar level from race to race.

Normal Stability

  • Jodie Swallow: 55% +0% -45% (10) [originally posted: 78% +0% -22% (9) – she has since DNF’d in South Africa]
  • Caroline Steffen: 92% +8% -0% (20) [originally posted: 75% +25% -0% (20)]
  • Meredith Kessler: 65% +14% -20% (23) [originally posted: 70% +17% -13% (23)]
  • Andreas Raelert: 48% +0% -52% (19) [originally posted: 63% +0% -37% (19)]
  • Luke McKenzie: 51% +30% -19% (26) [originally posted: 62% +23% -15% (26)]
For these athletes predictions give a good indication, but it is also interesting whether there is a higher potential for an “up-side”, better-than-expected result (larger percentage of faster results, e.g. Carolin Steffen) or for a “down-side” result (larger percentage of sub-par results, e.g. Jodie Swallow or Andreas Raelert). For other athletes, the day could go either way (e.g. Meredith Kessler or Luke McKenzie).

Lower Stability

  • Sarah Piampiano: 41% +47% -12% (14) [originally posted: 50% +43% -7% (14)]
  • Luke Bell: 23% +5% -72% (26) [originally posted: 38% +12% -50% (26)]
  • Dede Griesbauer: 41% +18% -40% (26) [originally posted: 32% +32% -36% (25)]
  • Tim O’Donnell: 14% +63% -23% (11) [originally posted: 27% +45% -27% (11)]
  • Pete Jacobs: 5% +16% -79% (26) [originally posted: 15% +42% -42% (26)]

Then there are athletes that have a lower fraction of “normal” results. Here it’s also interesting to look at the upside (e.g. Sarah Piampiano, Tim O’Donnell) or downside potential (e.g. Luke Bell). Some athletes’ results are very hard to predict from previous numbers – for example Dede Griesbauer and Pete Jacobs have had a good fraction of great results but also slower, disappointing results.

Continental and National Fastest Times

After the fast times at the end of 2015 there has been some discussion about continental and national “records” over the Ironman-distance. Because of doubts about the accuracy of courses, comparing times from different courses is always a bit tricky, but here is an overview of the data I was able to compile.

Please let me know if I missed some older results that are better than the continental and national records in this list!

Continental Records

Female Athletes

Continent Athlete Nation Time Date Race
Africa McEwan, Dianne ZAF 09:37:45 14.04.13 IM South Africa
Asia/Pacific Shiono, Emi JPN 09:23:26 01.03.08 IM New Zealand
Australia Carfrae, Mirinda AUS 08:38:53 20.07.14 Challenge Roth
Europe Wellington, Chrissie GBR 08:18:13 10.07.11 Challenge Roth
North America Corbin, Linsey USA 08:42:42 29.06.14 IM Austria
South America Monticeli, Ariane BRA 08:59:08 31.05.15 IM Brasil

Male Athletes

Continent Athlete Nation Time Date Race
Africa Cunnama, James ZAF 07:59:59 08.07.12 Challenge Roth
Asia/Pacific Vernay, Patrick NCL 08:03:46 12.07.09 Challenge Roth
Australia McCormack, Chris AUS 07:54:23 24.06.07 Challenge Roth
Europe Raelert, Andreas GER 07:41:33 10.07.11 Challenge Roth
North America Starykowicz, Andrew USA 07:55:22 02.11.13 IM Florida
South America Amorelli, Igor BRA 07:59:36 31.05.15 IM Brasil

National Records

Sometimes, the nation of an athlete is not clear – often athletes are listed with their country of residence (e.g. foreign athletes staying in Boulder), and some athletes have dual citizenships. Please let me know if I have mis-attributed a fast result by an athlete to the wrong country!

Female Athletes

Nation Athlete Total Date Race
AUS Carfrae, Mirinda 08:38:53 20.07.14 Challenge Roth
AUT Wutti, Eva 08:37:36 18.08.13 IM Copenhagen
BEL Goos, Sofie 08:57:08 29.06.14 IM Austria
BRA Monticeli, Ariane 08:59:08 31.05.15 IM Brasil
CAN Naeth, Angela 08:54:55 28.09.14 IM Chattanooga
CZE Reed, Lucie 08:57:34 06.10.13 Challenge Barcelona
DEN Pedersen, Camilla 08:56:01 07.07.13 IM Germany
FIN Lehtonen, Kaisa 08:48:40 04.10.15 IM Barcelona
FRA Collonge, Jeanne 09:20:51 23.06.13 IM France
GBR Wellington, Chrissie 08:18:13 10.07.11 Challenge Roth
GER Wallenhorst, Sandra 08:47:26 13.07.08 IM Austria
HUN Csomor, Erika 08:47:05 13.07.08 Challenge Roth
IRL Mullan, Eimear 08:56:51 04.10.15 IM Barcelona
ITA Niederfriniger, Edith 08:59:45 13.07.08 IM Austria
NED Van Vlerken, Yvonne 08:43:07 02.11.13 IM Florida
NZL Martin, Britta 08:56:34 07.12.14 IM Western Australia
SUI Steffen, Caroline 08:34:51 24.03.12 IM Melbourne
SWE Lundstroem, Asa 09:02:49 22.03.15 IM Melbourne
UKR Kozulina, Tamara 09:06:42 13.07.08 IM Austria
USA Corbin, Linsey 08:42:42 29.06.14 IM Austria
ZAF McEwan, Dianne 09:37:45 14.04.13 IM South Africa

Male Athletes

Nation Athlete Finish Date Race
AUS McCormack, Chris 07:54:23 24.06.07 Challenge Roth
AUT Weiss, Michael 07:57:39 03.07.11 IM Austria
BEL Vanhoenacker, Marino 07:45:58 03.07.11 IM Austria
BMU Butterfield, Tyler 08:05:22 31.05.15 IM Brasil
BRA Amorelli, Igor 07:59:36 31.05.15 IM Brasil
CAN McMahon, Brent 07:55:48 16.11.14 IM Arizona
CZE Ospaly, Filip 07:58:44 02.11.13 IM Florida
DEN Henning, Rasmus 07:52:36 18.07.10 Challenge Roth
ESP Rana, Ivan 07:48:43 29.06.14 IM Austria
EST Albert, Marko 08:08:17 03.07.11 IM Austria
FRA Chevrot, Denis 08:05:58 07.12.14 IM Western Australia
GBR Amey, Paul 08:01:29 19.11.11 IM Arizona
GER Raelert, Andreas 07:41:33 10.07.11 Challenge Roth
LUX Bockel, Dirk 07:52:01 14.07.13 Challenge Roth
NCL Vernay, Patrick 08:03:46 12.07.09 Challenge Roth
NED Van der Marel, Jan 07:57:46 04.09.1999 Almere Triathlon
NED Diederen, Bas 08:05:36 05.07.15 IM Germany
NZL Brown, Cameron 08:00:12 24.03.12 IM Melbourne
POR Marques, Sergio 08:05:21 06.10.13 Challenge Barcelona
SLO Plese, David 08:02:20 04.10.15 IM Barcelona
SUI Schildknecht, Ronnie 07:59:42 05.11.11 IM Florida
SWE Nilsson, Patrik 08:08:05 15.08.15 IM Sweden
USA Starykowicz, Andrew 07:55:22 02.11.13 IM Florida
ZAF Cunnama, James 07:59:59 08.07.12 Challenge Roth

Notes

There are a few records that need some explanations.

Female African & South African Record

I have listed Dianne McEwan (now Dianne Emery who became a mom in January) as the African record holder, but she herself considers Annah Watkinson’s 9:31 from IM Austria 2015 as the record. Annah has been racing as an age-grouper then, and because of the different race dynamics from the Pro race I’m not counting her result. But Annah has turned Pro this season, so there’s a good chance we will see a new African record this year, maybe as early as IM South Africa!

Male Canadian Record

Lionel Sanders sent me the following tweet after I posted the fastest times:

SandersTweet

Lionel is right that Peter finished in 7:51:56 in 1999. However, it is accepted that the marathon in Klagenfurt was short (Peter ran a 2:35:21!) – probably by more than 1k. With Peter being a great athlete and fast runner, one could speculate if he could have finished faster than Brent’s 7:55:48, but I’ve decided to not accept his time as a record.

Male Dutch Record

Jefry Visier, the Operational Director of Challenge Almere, was going through older Almere results and found four times (three by Jan Van der Marel and one by Frank Heldoorn) that were quicker than the one I had from Bas Diederen.

2015 Money Lists

This is an except from my free “2015 TriRating Report“. If you’re interested in more information about the 2015 long-distance Triathlon season, you should definitely check it out!

Overall Money List

First, here is an overview of the races I have included in my money list:

Type Description Total Prize Money # of Athletes
Kona Ironman World Championship (Kona) $ 650.000 20
Ironman Full-distance WTC races (not including Kona) $ 2.271.000 318
70.3 Champs 70.3 World Championship (Zell am See) $ 250.000 20
70.3s 70.3 races (not including Champs) $ 2.177.500 400
Challenge Full-distance Challenge races (including Roth) $ 360.750 83
Sum All included races $ 5.709.250 582

This does not include the $1 million prize that Daniela Ryf collected for winning the “Triple Crown”.

The next table shows the Top 20 athletes – both from the men and women – that have earned the most prize money in the 2015 calendar year from all the races listed above:

# Name Sex Total Money
1 Ryf, Daniela F $223.000
2 Frodeno, Jan M $213.000
3 Kessler, Meredith F $86.000
4 Blatchford, Liz F $79.750
5 Raelert, Andreas M $77.750
6 Potts, Andy M $75.500
7 Joyce, Rachel F $73.250
8 O’Donnell, Timothy M $67.500
9 Sanders, Lionel M $66.500
10 Wurtele, Heather F $64.500
11 Van Vlerken, Yvonne F $63.250
12 Steffen, Caroline F $60.750
13 Don, Tim M $58.000
14 Jackson, Heather F $57.750
15 Swallow, Jodie F $57.500
16 Naeth, Angela F $55.000
17 Kienle, Sebastian M $52.500
18 Pedersen, Camilla F $51.750
19 Piampiano, Sarah F $49.250
20 Gossage, Lucy F $47.000

Meredith Kessler has made it into third spot without any money from the “big money races” in Kona or Zell Am See.

Ironman Money List

Here are the Top 15 money earners from Ironman races (excluding Kona):

# Name Sex Ironman Total Overall Rank
1 Vanhoenacker, Marino M $44.000 $44.000 24
2 Van Lierde, Frederik M $35.000 $40.250 26
3 Kessler, Meredith F $34.000 $86.000 3
4 Van Vlerken, Yvonne F $33.250 $63.250 11
5 Hanson, Matt M $31.000 $32.000 40
6 Ryf, Daniela F $30.000 $223.000 1
6 Frodeno, Jan M $30.000 $213.000 2
6 Blatchford, Liz F $30.000 $79.750 4
6 Swallow, Jodie F $30.000 $57.500 15
6 Naeth, Angela F $30.000 $55.000 16
6 McKenzie, Luke M $30.000 $36.750 33
6 Monticeli, Ariane F $30.000 $35.750 35
6 Symonds, Jeff M $30.000 $31.250 43
6 Hauschildt, Melissa F $30.000 $30.000 44
15 Sanders, Lionel M $29.500 $66.500 9

70.3 Money List

Here are the Top 15 money earners from 70.3 races (including the 70.3 Champs):

# Name Sex 70.3 Total Zell Am See Other 70.3s Overall Rank
1 Ryf, Daniela F $73.000 $45.000 $28.000 1
2 Frodeno, Jan M $63.000 $45.000 $18.000 2
3 Wurtele, Heather F $58.000 $20.000 $38.000 10
3 Don, Tim M $58.000 $- $58.000 13
5 Kessler, Meredith F $52.000 $- $52.000 3
6 Sanders, Lionel M $37.000 $- $37.000 9
6 Tisseyre, Magali F $37.000 $10.000 $27.000 32
8 Steffen, Caroline F $33.750 $- $33.750 12
9 Aernouts, Bart M $33.250 $10.000 $23.250 25
10 Potts, Andy M $33.000 $- $33.000 6
11 Kaye, Alicia F $32.500 $7.500 $25.000 39
12 Goss, Lauren F $31.500 $- $31.500 41
13 Swallow, Jodie F $27.500 $- $27.500 15
13 Reed, Tim M $27.500 $- $27.500 51
15 Boecherer, Andi M $27.250 $6.500 $20.750 36

It’s interesting to note that Tim Don and Meredith Kessler have almost made it to the top of the list without any money from the 70.3 Championships.

Validating 2015 Predictions

I have been publishing Race Predictions for a few years now, so it’s about time to have a look at how “good” my predictions are. When I started in 2011, there were quite a few changes in the algorithm and the parameters to deal with a number of edge cases. During 2015 there have not been any changes, so this is a good data pool for validation.

Data Used

In 2015 I have published predictions for 36 Professional Ironman-distance races, 31 Ironman-branded races by WTC and 5 more Challenge races.There have been a total of 1098 finishes, 688 by male athletes (62.7%) and 410 by females (37.3%). These were posted by 600 different athletes, 382 male (63.7%) and 218 females (36.3). In addition there were 349 DNFs, 244 by males (70%) and 105 by females (30%).

Using my algorithm and the available start lists, I have seeded the participants in each of the races, and predicted 930 finishing times (84.7% of all finishers). There are some cases when I didn’t predict the finishing times, for example when an athlete didn’t have any prior IM-distance finishes or when there was a late entry (and therefore the athlete not included in the start list).

Predicting the Winners

Here’s a look at the places the eventual race winners have been seeded based on previous results and the start lists:

WinnersSeeded

With 36 IM-distance races, there are 72 winners (one each for the male and female race). My algorithm has correctly predicted the winner in 26 races (36%), and another 26 winners were seeded in #2 or #3 (winning frequency of an athlete seeded on the podium: 72%). Only three winners have been seeded higher than 8th: Kirill Kotshegarov was seeded 10th at IM Chattanooga, Mel Hauschildt was seeded 11th at IM Melbourne, and Matt Hanson was seeded 12th at IM Texas. There was also one unrated (and therefore unseeded) winner in 2015: Jesse Thomas won IM Wales in his debut Ironman.

The numbers would be even better when only considering the athletes that finished a race. Only including athletes that actually started increases the frequency of picking the right winner to 39% (and one of the podium picks to win the race to 80%), also discarding athletes not finishing would have yielded 42% and 83% of the winners.

Time Predictions

In my pre-race posts, the finish times are predicted for each athlete that has raced an Ironman race before. The algorithm considers the previous finishing times of an athlete and the course that the race is going to be held on.

The following graph compares the actual finish times to the predicted finish times (each data point is one dot on the graph). Dots towards the upper left are results where the actual finish was faster than predicted, dots towards the lower right are results that are slower than predicted.

The graph shows actual and predicted times between 8 and 12 hours (only 11 faster results/predictions and 15 slower ones are missing).

ActualVsPredicted

I have added a “trend line” that shows the best fit of all the data points, highlighting the fact that most of the data points are pretty close to the “diagonal” (where actual = predicted). Between 8 hours and 10 hours the algorithmic predictions are pretty good on average (maybe predictions are a bit too fast around 8 hours). Towards 10 hours finishing time and especially over 10 hours the predictions are too fast: This is caused by “explosions” that lead to very slow times even for athletes that have been predicted to be relatively fast. To put it another way: Finishing times over 10 hours are most often bad races that are pretty much unpredictable using only data.

Here is another way of looking at how far off the time predictions have been from the actual results:

Difference

The graph shows the number of results in one minute bins of difference between predicted and actual finishing times. Data points towards the right are faster than predicted, they are slower than predicted to the left. Again a trend line smoothes out the statistical “noise”.

A few observations:

  • In a range roughly between -40 minutes and +40 minutes the graph is pretty symmetric and is very close the normal distribution.
  • As noted above, there is relatively large number of “explosions” with large negative differences, resulting in a non-symmetrical distribution on the edges of the graph. (There are 49 results that are more than 60 minutes slower than predicted, but only 10 that are more than 60 minutes faster.)

On average, the predictions are -4.7 minutes off the actual finish time (i.e. the actual finish is  slower by close to five minutes). An average close to 0 means that on average the predictions are closer to the actual finish. The standard deviation is 31.8, this means that 68% of the time differences are between -36.5 and 27.1 minutes (-4.7 +/- 31.8 minutes). Usually, a smaller deviation corresponds to a “better” prediction.

The standard “statistical” way of measuring the dependence between two data sets is correlation. Correlation is +1 in the case of a perfect linear relationship, −1 in the case of a perfect inverse relationship, and some value between −1 and 1 in all other cases, indicating the degree of linear dependence between the variables. A value around zero indicates that there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables. The predictions and actual finishing times have a correlation of 0.77, indicating a pretty strong dependence between the two data sets.

Comparing Other Prediction Strategies

When trying to predict a completely random event (the classical example is a “perfect dice”), the correlation between the actual events and the predictions won’t be very high. When only working off previous results, anything that happens before a race – for example a good block of training, a slight injury – will influence how fast an athlete is able to go on race day, as will random events during the race (e.g. getting punched during the swim, technical issues on the bike). Therefore a “perfect prediction” (resulting in a correlation of 1) is impossible, and in order to determine whether a correlation of 0.77 indicates a “good predictor” or not, one has to compare the results of my algorithm to other predictions.

I am not aware of anyone other than me publishing time predictions for Ironman races on a regular basis. (Please let me know if there is!) Therefore, I am comparing my predictions to a few much simpler strategies:

  1. Last Finish: “You are only as good as your last race” (prediction = last IM-distance finish)
  2. Best Finish: “My best time is a sub-x” (prediction = fastest IM-distance finish)
  3. Average Finish: “I usually finish around y” (prediction = average IM-distance finish)
  4. Average Last Year Finish: “This season is going great” (prediction = average IM-distance finish in the last twelve months)

Here’s a comparison of the correlation of these different methods and my comments:

Number of Data Points Average Difference Standard Deviation Correlation to Actual Finish Comments
Last Finish 943 -0.75 41.14 0.649 Good on average, but wide deviation and lower correlation
Best Finish 943 -21.48 39.59 0.659 Slightly better deviation and correlation, but large average difference
Average Finish 943 -0.84 35.77 0.706 Good on average, but wider deviation and lower correlation than TTR Predictions
(still better than last/best finish)
Average Last Year Finish 877 -1.89 35.08 0.706 Almost the same as the Average Finish, but applicable for fewer cases
TTR Predictions 930 -4.70 31.81 0.770 Lowest deviation, highest correlation

Summary

The Prediction Algorithm I use to calculate the expected times in my pre-race posts provides better predictions than simpler prediction strategies. My model certainly has limitations, but the large number of “successful” winner predictions and the high correlation show that the time predictions and the conclusions drawn from them are pretty much valid. I think my analysis is quite good at telling the “data part of the story”.

While the “data part” is an important (and impartial) part of the story, it is still only a part of the story. A coach or teammate that has been able to observe an athlete getting ready for a race has additional (and more current) information available – even if that is not always fully objective.

The tension between past performances, the uncertainty of a future performance, the challenges athletes face in their training and the hard work they put in to be better in their next race .. that’s why I still love following the races!

Select your currency
EUR Euro

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close