Novel machine learning tool IDs early Parkinson’s biomarkers
Algorithm shows high accuracy in predicting who will develop disorder
A novel machine learning tool, called CRANK-MS, was able to identify, with high accuracy, people who would go on to develop Parkinson’s disease, based on an analysis of blood molecules.
The algorithm identified several molecules that may serve as early biomarkers of Parkinson’s.
These findings show the potential of artificial intelligence (AI) to improve healthcare, according to researchers from the University of New South Wales (UNSW), in Australia, who are developing the machine learning tool with colleagues from Boston University, in the U.S.
“The application of CRANK-MS to detect Parkinson’s disease is just one example of how AI can improve the way we diagnose and monitor diseases,” Diana Zhang, a study co-author from UNSW, said in a press release.
The study, “Interpretable Machine Learning on Metabolomics Data Reveals Biomarkers for Parkinson’s Disease,” was published in ACS Central Science.
CRANK-MS machine learning tool allows analysis of more data
Parkinson’s disease now is diagnosed based on the symptoms a person is experiencing; there isn’t a biological test that can definitively identify the disease. Many researchers are working to identify biomarkers of Parkinson’s, which might be measured to help identify the neurodegenerative disorder or predict the risk of developing it.
Here, the international team of researchers used machine learning to analyze metabolomic data — that is, large-scale analyses of levels of thousands of different molecules detected in patients’ blood — to identify Parkinson’s biomarkers.
The analysis used blood samples collected from the Spanish European Prospective Investigation into Cancer and Nutrition (EPIC). There were 39 samples from people who would go on to develop Parkinson’s after up to 15 years of follow-up, and another 39 samples from people who did not develop the disorder over follow-up. The metabolomic makeup of the samples was assessed with a chemical analysis technique called mass spectrometry.
In the simplest terms, machine learning involves feeding a computer a bunch of data, alongside a set of goals and mathematical rules called algorithms. Based on the rules and algorithms, the computer determines — or learns — how to make sense of the data.
This study specifically used a form of machine learning algorithm called a neural network. As the name implies, the algorithm is structured with a similar logical flow to how data is processed by nerve cells in the brain.
Machine learning has been used to analyze metabolomic data before. However, previous studies have generally not used wide-scale metabolomic data — instead, scientists selected specific markers of interest to include, while not including data for other markers.
Such limits were used because wide-scale metabolomic data typically covers thousands of different molecules, and there’s a lot of variation — so-called noise — in the data. Prior machine learning algorithms have generally had poor results when using such noisy data, because it’s hard for the computer to detect meaningful patterns amidst all the random variation.
The researchers’ new algorithm, CRANK-MS — short for Classification and Ranking Analysis using Neural network generates Knowledge from Mass Spectrometry — has a better ability to sort through the noise, and was able to provide high-accuracy results using full metabolomic data.
Here we feed all the information into CRANK-MS without any data reduction right at the start. And from that, we can get the model prediction and identify which metabolites are driving the prediction the most, all in one step.
“Typically, researchers using machine learning to examine correlations between metabolites and disease reduce the number of chemical features first, before they feed it into the algorithm,” said W. Alexander Donald, PhD, a study co-author from UNSW, in Sydney.
“But here,” Donald said, “we feed all the information into CRANK-MS without any data reduction right at the start. And from that, we can get the model prediction and identify which metabolites are driving the prediction the most, all in one step.”
Including all molecules available in the dataset “means that if there are metabolites [molecules] which may potentially have been missed using conventional approaches, we can now pick those up,” Donald said.
The researchers stressed that further validation is needed to test the algorithm. But in their preliminary tests, CRANK-MS was able to differentiate between Parkinson’s and non-Parkinson’s individuals with an accuracy of up to about 96%.
Noteworthy findings highlight diet and chemical exposure
In further analyses, the researchers determined which molecules were picked up by the algorithm as the most important for identifying Parkinson’s.
There were several noteworthy findings: For example, patients who went on to develop Parkinson’s tended to have lower levels of a triterpenoid chemical known to have nerve-protecting properties. That substance is found at high levels in foods like apples, olives, and tomatoes.
Further, these patients also often had high levels of polyfluorinated alkyl substances (PFAS), which may be a marker of exposure to industrial chemicals.
“These data indicate that these metabolites are potential early indicators for PD [Parkinson’s disease] that predate clinical PD diagnosis and are consistent with specific food diets (such as the Mediterranean diet) for PD prevention and that exposure to [PFASs] may contribute to the development of PD,” the researchers wrote. The team noted a need for further research into these potential biomarkers.
The scientists have made the CRANK-MS algorithm publicly available for other researchers to use. The team says this algorithm likely has applications far beyond Parkinson’s.
“We’ve built the model in such a way that it’s fit for purpose,” Zhang said. “What’s exciting is that CRANK-MS can be readily applied to other diseases to identify new biomarkers of interest. The tool is user-friendly where on average, results can be generated in less than 10 minutes on a conventional laptop.”