The central question that sports scientists are grappling with these days is this: What the heck are we going to do with all this data? In endurance sports, weve progressed from heart rate monitors and GPS watches to , , and continuous glucose measurements, all displayed on your wrist then automatically downloaded to your computer. Team sports have undergone a similar tech revolution. The resulting data is fascinating and abundant, but is it actually useful?
A tackles this question and presents an interesting framework for thinking about it, derived from the . The paper comes from Kobe Houtmeyers and Arne Jaspers of KU Leuven in Belgium, along with Pedro Figueiredo of the Portuguese Football Federations Portugal Football School.
Heres their four-stage framework for data analytics, presented in order of both increasing complexity and increasing value to the athlete or coach:
- Descriptive: What happened?
- Diagnostic: Why did it happen?
- Predictive: What will happen?
- Prescriptive: How do we make it happen?
Each stage builds on the previous one, which means that the descriptive layer is the foundation for everything else. Is the data good enough? Im pretty confident that a modern GPS watch can accurately describe how far and how fast Ive run in training, which allows me to move to the next stage and try to diagnose whether a good or bad race resulted from training too much, too little, too hard, too easy, and so on. In contrast, the heart rate data I get from wrist sensors on sports watches is utter garbage (as verified by comparing it to data from chest straps). It took me a while to realize that, and any insights I drew from that flawed data would obviously have been meaningless and possibly damaging to my training.
Making predictions is harder (especially, as the saying goes, about the future). Scientists in a variety of sports have tried to use machine learning to comb through big sets of training data to predict whos at high risk of getting injured. For example, a study by researchers at the University of Groningen in the Netherlands plugged seven years of training and injury data from 74 competitive runners into an algorithm that parsed risk based on either the previous seven days of running (with ten parameters for each day, like the total distance in different training zones, perceived exertion, and duration of cross-training) or the previous three weeks (with 22 parameters per week). The resulting model, like similar ones in other sports, was significantly better than a coin toss at predicting injuries, but not yet good enough to base training decisions on.
Prescriptive analytics, the holy grail for sports scientists, is even more elusive. A simple example that doesnt require any heavy computation is heart-rate variability (HRV), a proxy measure of stress and recovery status that (as I discussed in a 2018 article) has been proposed as a daily guide for deciding whether to train hard or easy. Even though the physiology makes sense, Ive been skeptical of delegating crucial training decisions to an algorithm. Thats a false choice, though, according to Houtmeyers and his colleagues. Prescriptive analytics provides decision support systems: the algorithm isnt replacing the coach, but is providing him or her with another perspective thats not weighed down by the inevitable cognitive biases that afflict human decision-making.
Interestingly, Marco Altini, one of the leaders in developing approaches to HRV-guided training, posted a few weeks ago in which he reflected on what has changed in the field since my 2018 article. Among the insights: the measuring technology has improved, as has knowledge about how and when to use it to get the most reliable data. Thats key for descriptive usage. But even good data doesnt guarantee good prescriptive advice. According to Altini, studies of HRV-guided training (like ) have moved away from tweaking workout plans based on the vagaries of that mornings reading, relying instead on longer-term trends like running seven-day averages. Even with those caveats, Id still view HRV as a source of decision support rather than as a decision-maker.
One of the reasons Houtmeyerss paper appealed to me is that I spent a bunch of time thinking about these issues during my recent experiment with continuous glucose monitoring. The four-stage framework helps clarify my thinking. Its clear that CGMs offer great descriptive data; and with some effort, I think you can also get some good diagnostic insights. But the , as youd expect, is explicitly focused on predictive and prescriptive promises: guiding you on what and when to eat in order to maximize performance and recovery. Maybe thats possible, but Im not yet convinced.
In fact, if theres one simple message I take away from this paper, its that description and diagnosis are not the same thing as prediction and prescription. The latter doesnt follow automatically from the former. As the data sets keep getting bigger and higher-quality, it seems inevitable that well eventually reach the point when machine-learning algorithms can pick up patterns and interactions that even highly experienced coaches might miss. But thats a big leap, and data on its owneven big datawont get us there.
For more Sweat Science, join me on and , sign up for the , and check out my book .