Big data for epidemiology and biostatistics

At the Centre for Big Data Research in Health (CBDRH), we define our research questions alongside clinicians and policy makers. We design epidemiologically rigorous studies using linked administrative data and we secure the proper ethical approvals to link datasets.

Our researchers manage data by developing governance and documentation for complex research data assets. We're managing data access, technologies and statistical disclosure control for sensitive health and personal data. We also clean and wrangle data to prepare research-ready datasets from multiple linked data sources. 

The statistical analyses we perform engage modern statistical approaches for addressing different research aims. This includes descriptive analysis, prediction and estimating causal effects. Our methods for analysing data from different study designs include observational data, cohort studies and clinical trials. The methods for analysing data with complex structure include longitudinal data, clustered data and time series data. Quasi-experimental designs for estimating the effects of programs and interventions use observational data. Methods to account for common biases in research data include, measurement error, selection bias and missing data bias. The expertise we use for software and packages for analysing health data include R, Python, SAS and Stata. 

We disseminate our results by developing data infographics, visualisations and interactive dashboards to bring our results to life. We then communicate these results across a range of modes and to diverse audiences—from publishing in academic journals to presenting to clinicians and policymakers and in writing for lay audiences

We also provide regular, expert commentary to media outlets on health, data and research issues.

Project: Hospital Trajectories for 15 mil Australians - NHMRC Ideas Grant (2022–2024)

Patient segmentation, which divides a patient population into distinct groups with specific needs and characteristics, underpins patient-centric approaches to tailoring care delivery.

We propose new methods to create longitudinal, patient-centric and data-driven representations of patient experience. We will use chronologically ordered hospital records for the population of New South Wales (~15 million individuals, ~90 million hospital events).

Using deep learning, we will compute patient ‘representations’ for trajectories of hospital use events and use these to identify clusters of individuals who share patterns of hospital use (patient segments). We will characterise and visualise the segments and explore how they change over time and the impacts of policy changes and population shocks. We will make our methods available via open-source code and an end-user app.

Our novel approach to patient segmentation uses sequences of health service use events—not diagnosis codes—as the primary way to represent patients. To do this, we employ Transformer deep learning architecture to make optimal use of longitudinal patient sequence data.

Our approach is ‘patient-journey-centric’—thereby aligned with the key preoccupations of policymakers—and ‘diagnosis-agnostic’—thereby flexibly supporting exploration of hospital trajectories in patient subgroups defined by any diagnosis, combinations of diagnoses or demographic features such as geography and ethnicity.

Our methods will inform health service planning, delivery and evaluation by allowing identification of patient subgroups who are most likely to be high users of hospital services. They will form the foundation for deep learning methods that can be applied to large multimodal datasets that integrate primary and secondary care e.g., MyHealth Record.

Our team includes senior, mid- and early-career researchers with outstanding expertise in data analytics, machine learning, biostatistics, health economics and research translation.

Project: Translating the evidence on dementia risk reduction to generate assessments, advice and training for health professionals, policy makers, patients and public - NHMRC Boosting Dementia

Leveraging Evidence into Action for Dementia! (LEAD!) is a project funded by a NHMRC Boosting Dementia Research Grant. Led by Professor Kaarin Anstey at Neurosciences Research Australia, it involves an international collaboration between leading academics, clinicians, consumers, and community members. Organisations involved include the Department of Health, WHO, Dementia Australia, Alzheimer’s Disease International, Diabetes Australia, and Heart Foundation.

The project aims to translate dementia research and implement evidence-based strategies for dementia risk reduction to individuals, communities, and healthcare centres. It seeks to develop a new risk tool for predicting dementia and other non-communicable diseases including stroke, diabetes, and myocardial infarction. Researchers at the Centre for Big Data Research in Health, Professor Louisa Jorm and Dr Heidi Welberry, are supporting the Evaluation and Adoption stream by (i) developing tools to assess national risk factor profile trajectories and (ii) investigating the feasibility of incorporating existing administrative and/or clinical data directly into the risk tools.