Algorithm Able to Predict Initial 5-year Rate of Parkinson’s Progression
Machine learning formula drives models aiming to improve care, clinical trials
A novel machine learning algorithm was able to accurately classify people with Parkinson’s disease based on their predicted rate of disease progression — slow, moderate, or fast — over the five years after diagnosis, a study using patient registry data reported.
It also identified Parkinson’s motor symptoms as a key driver of these varying progression rates.
Researchers expect that their algorithm, by “defining subcategories” of Parkinson’s and showing an “ability to predict even a proportion of the disease course,” will allow for more effective clinical trials supporting new treatments, especially for people in early disease stages “when therapeutic interventions are likely to be most effective,” and more individualized care.
The study, “Identification and prediction of Parkinson’s disease subtypes and progression using machine learning in two cohorts,” was published in npj Parkinson’s Disease.
Taking on challenges due to varying rates of Parkinson’s progression
Parkinson’s symptoms tend to get worse as time goes on. However, there is a great deal of variability in the rate of disease progression from person to person, which can make it difficult to accurately predict the disease’s likely course in clinical care.
This variability also causes complications in clinical trials, particularly with often “expensive and failure-prone” Phase 3 studies, since it can be difficult to detect the effect of a potential therapy amid the “noise” of person-to-person variations.
As such, “there is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course,” the researchers wrote.
A team led by scientists at the National Institute on Aging and National Institute of Neurological Disorders and Stroke in the U.S. used machine learning to determine and predict the rate of Parkinson’s progression.
Basically, machine learning involves feeding data into a computer, alongside mathematical algorithms that the computer uses to “learn” from the data.
First, the researchers used clinical data from 294 patients in the Parkinson’s Disease Progression Marker Initiative (PPMI) to “train” the machine learning algorithm. Then, the resulting models were validated using data from 263 patients in another large and well-characterized group, the Parkinson’s Disease Biomarker Program (PDBP).
To account for the wide variation in the motor and non-motor symptoms of Parkinson’s, the data used to train and test the algorithm included a comprehensive assessment of all symptoms over the five years following an initial disease diagnosis.
In both datasets, the machine learning algorithm was able to sort patients into three “disease subtypes with highly predictable progression rates, corresponding to slow, moderate, and fast disease progression,” the team wrote.
In the PPMI group, about 45% of patients were classified as slow progressors, 39% as moderate progressors, and 16% as fast progressors. Among the PDBP group, there were 46% slow, 23% moderate, and 31% fast progressors.
Further analyses showed that motor symptoms were the major driver of disease progression variations, though non-motor symptoms also played a substantial role.
“The projected motor dimension significantly contributes towards PD [Parkinson’s disease] progression; however, sleep and cognition are essential, accounting for 37% variation,” the team wrote.
Fast-progressing patients also tended to have higher blood levels of neurofilament light chain (Nfl), a marker of nerve damage.
However, there were no clear differences between these three patient subgroups in terms of genetic variations previously linked to the risk of developing Parkinson’s, suggesting that “genetic variants relating to risk do not necessarily affect progression,” the researchers wrote.
Having demonstrated that they could cluster patients based on three different rates of progression, the researchers next used machine learning strategies to predict a five-year future rate of progression based on different factors.
These included clinical data available at the time of diagnosis alone, clinical data at diagnosis and over the disease’s first year, and biological and genetic data.
To test the accuracy of these predictive models, the researchers used a statistical measure called the area under the receiver operating curve, or AUC, which tests how well a tool can distinguish between two groups (in this case, accurate or inaccurate predictions). AUC values range from 0 to 1, with higher values reflecting better results.
Results showed that the AUC of the model with data at diagnosis alone was 0.92 for the PPMI group: 0.94 for slow-progressing patients, 0.86 for the moderate-progressing patients, and 0.95 for the fast-progressing patients.
When testing this model in the PDBP group, the AUC was slightly lower, 0.84. This was mainly associated with a reduced accuracy for the moderate-progressing group (AUC of 0.73), which contained fewer patients, the team noted.
“Despite the smaller sample size of the PDBP [group], the results strongly validate our previous observations of distinct, computationally discernible subtypes within the [Parkinson’s disease] population,” the researchers wrote.
‘Deep, wide, well-curated data’ essential for models’ predictive accuracy
The model’s predictive accuracy could be further improved when clinical data from the first year after diagnosis also were included.
“The increased accuracy trend is due to the availability of more information about a subject,” the team wrote. “This approach is also practical in a clinical setting, as physicians will provide a better prognosis for patients after a one-year follow-up.”
In turn, poorer predictions were obtained when the model accounted only for biological and genetic data.
“Our work highlights the utility of machine learning as an ancillary diagnostic tool to identify disease subtypes and project individualized progression rates,” the researchers wrote.
This study “is a step forward toward designing sophisticated machine-learning [models] to facilitate the early diagnosis of PD progression and longitudinal biomarker discovery such as our finding of elevated Nfl in fast progressors” they added.
“We anticipate that machine learning models will improve patient counseling, clinical trial design, and ultimately individualized patient care,” the team wrote, adding that “much more needs to be done.”
In particular, the team noted that this type of analysis requires large datasets with comprehensive, standardized data.
“Collecting such data is a challenge in PD, with relatively few [patient groups] available with deep, wide, well-curated data,” the researchers wrote. “Thus, a critical need is the expansion or replication of efforts such as PPMI or PDBP, importantly with a model that allows unfettered access to the associated data.”