Researchers from the Centre for Healthy Brain Ageing (CHeBA) and the School of Computer Science and Engineering at UNSW Sydney have undertaken the largest comparison of survival analysis methods to date, to predict the onset of dementia using machine learning. The comparison, published in Nature Scientific Reports, is the first work to apply these methods to CHeBA’s Sydney Memory and Ageing Study and examines the most diverse variety of data in a study on dementia to date. There is currently no cure for dementia and no treatment available that can successfully change the course of this disease. “Machine learning models that can predict the time until a person develops dementia are critical tools in helping our understanding of dementia risks,” said lead author and computer scientist, Annette Spooner who is also a PhD student.
Previous research analysing data to predict onset of dementia often used models that were designed for small data sets and did not apply well to high dimensional data. Data are defined as high-dimensional when the number of features or variables exceeds the number of observations. Data collected from clinical trials, such as dementia studies usually focus on the prevention, detection, treatment or management of various diseases and medical conditions. The data are typically not only high-dimensional but also arise from a variety of different sources with varying statistical properties and are often missing information. These present significant challenges to traditional statistical analysis.
Another issue is that this type of data is ‘censored’, meaning that an apparently healthy person may still go on to develop dementia beyond the study's time frame and so cannot be considered free of the disease. “A technique known as ‘survival analysis’, which predicts the time to an event, such as the diagnosis of a disease, is required to analyse these data,” explains Ms Spooner, "and we have used machine learning techniques that have been adapted to handle censored data, rather than the more traditional statistical techniques." “There is clearly a need for methods to overcome the various challenges to model these complex data,” explains Ms Spooner.
Recent research has shown that different sources of clinical data can provide complementary information about dementia. Integration of multiple sources of data leads to better prediction of cognitive decline. “Machine learning can give more accurate results than traditional statistical methods when modelling high-dimensional, heterogeneous, clinical data,” said Ms Spooner, whose research was supervised by Professor Arcot Sowmya and assisted by honours student Emily Chen.
The research compared the performance and stability of ten machine learning algorithms, combined with eight feature selection methods capable of performing predictions of this specific type of clinical data. Co-author and Co-Director of CHeBA, Professor Perminder Sachdev, said the models they developed predicted survival to dementia using data from Alzheimer’s Disease Neuroimaging Initiative as well as the Sydney Memory and Ageing Study.
CHeBA’s Sydney Memory and Ageing Study is a longitudinal study of 1037 participants, aged 70-90 years, that aims to determine the effects of ageing on cognition. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a US longitudinal study aimed at identifying biomarkers for the early detection and tracking of Alzheimer's disease.