Assoc/Prof Jean Yang
With the advancement of many high-throughput biotechnologies, an interest of many researchers has been to utilise multiple high-throughput data sources, together with clinical data, to improve the prognosis of disease outcome. The field of statistical learning has generated a large number of methods and approaches over the last decade addressing this problem. However, despite advances in classification methods, moderate error rates in the accuracy of patient classification persist. In this talk I will discuss a couple of different biological guided approaches that holds the promise of improve accuracy of prognostic biomarkers. The first approach describes a two-step classifier that enables the identification of clinico-pathologic variables associated with sample heterogeneity. We evaluate this ‘two-step’ framework using three independent cohorts; melanoma, breast, and colorectal cancers where we observe significant reductions in error rates respectively in comparison to the next best classifiers. The second approach introduces the concept of differential distribution (DD) to identify candidate biomarkers. This approach combines information from differential expression and differential variability in a unified metric. The method fits density estimates for each selected feature to the expression distributions of each class and uses a voting scheme to aggregate the class predictions of individual features. Evaluation based on a set of Melanoma gene expression data demonstrate that DD classification is a type of classification that is accurate, while also providing a complimentary set of selected features, which may lead to new biomarkers.