The rapid accumulation and availability of medical big data as a by-product of the care process together with the development of new deep learning and causal inference algorithms, provide us with the opportunity to create safer, more efficient and equitable models of care. However, to achieve this goal, we need to establish the critical data infrastructure required for big data analytics in healthcare.
We are conducting several projects that transform multiple-sourced complex electronic medical record (EMR) data into fully de-identified, enriched and standardized information, ready to be shared with the wider medical and research community. We focus on the following tasks:
- Data de-identification: EMR data contains protected health information easy to remove from structured form, but not from clinical narrative text. At CBDRH we have developed a solution that uses natural language algorithms to de-identify clinical text.
- Data Standardisation: Common data models provide (1) efficiency – reuse of analytics tools and software, (2) transparency and reproducibility—compare findings from different data sources, and (3) scalability—conducting studies by leveraging the data from multiple locations and settings. We have developed an ETL process for the conversion of EMR data to the Observational Medical Outcomes Partnership (OMOP) common data model, the leading international model for leveraging clinical and administrative health data.
- Extraction of concepts from clinical text: A significant percentage of relevant clinical information in the medical record is embedded in clinical notes. We are developing tools that make use of programmatic labelling and machine learning algorithms to automate the extraction of relevant clinical concepts from clinical text.