OMOP Common Data Model

Personalise
An abstract image depicting AI healthcare data integration for pharmaceuticals (AI generated)

Routinely collected health data can support observational research, evaluation, and service improvement, however, it is rarely analysis-ready and seldom supports reproducibility. Data structures differ across systems, coding schemes vary, and governance constraints shape what can be accessed and where.

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) community, is a widely used standard for organising and harmonising observational health data so it can be analysed and researched consistently and compared across settings.

We are a UNSW-based, multidisciplinary data services and consultancy team combining clinical, informatics, and data engineering expertise. We support health services, government agencies, and research/industry partners with OMOP conversion and OHDSI enablement in governed environments.

What we deliver

OMOP conversion is more than “just Extract Transform Load (ETL)”. It’s a cross-functional delivery problem: data engineering, clinical interpretation, terminology decisions, and governance working together with a clear need for traceability and validation.

Services

  • Source assessment, profiling, and conversion planning: We assess source data structure, coding systems, clinical context, and access constraints to support feasibility assessment, profiling and OMOP conversion planning using WhiteRabbit, Rabbit in a Hat, or equivalent approaches.
  • OMOP mapping and specification development: We develop a traceable version-controlled source-to-OMOP mapping specification with clear lineage, transformation logic, and review workflows to support clinical and stakeholder sign-off.
  • Vocabulary and ontology support: We provide source-to-standard concept mapping, local terminology translation, and vocabulary governance support including planning for vocabulary updates and ongoing maintenance.
  • ETL design, build, validation, and handover: We design and implement reproducible, versioned ETL workflows, support validation and deployment, and provide guidance for operational maintenance, refresh processes, and change control.
  • Data quality assessment and remediation support: We assess data quality, identify limitations, and work with teams to prioritise remediation so the OMOP dataset is fit for its intended use.
  • OHDSI tooling, cohort workflows, and enablement: We support OHDSI tooling setup and use, including data quality dashboard, WebAPI/ATLAS connectivity, cohort and phenotype workflows, training, and practical enablement so teams can work effectively with the converted data.
  • Research, protocol, and downstream analytics support: Following OMOP implementation, we can support study design, protocol and analysis plan development, cohort definition, analysis execution, and reproducible reporting.
  • Advisory, training, and operating model support: We provide workshops and advisory support across architecture, governance, and operating model design to help organisations sustain and use their OMOP environment effectively.

Experience and leadership

Our work sits at the intersection of health data engineering and applied clinical analytics, grounded in the OHDSI/OMOP ecosystem and shaped by the governed Australian health data.

We contribute to national and community efforts, including serving on the Steering Committee and leading the NSW node of AHDEN (Australian Health Data Evidence Network) (AHDEN overview) and engaging with OHDSI Australia (OHDSI Australia).

CardiacAI is a not-for-profit research group founded by clinicians and researchers from South Eastern Sydney Local Health District and UNSW, focused on building data resources, analytics, and tools to address cardiovascular disease.

The CardiacAI project includes supporting standardised, analysis-ready health data and research workflows in a governed environment.

Cardiac Analytics & Innovation has delivered a comprehensive OMOP-aligned data transformation project converting Cerner Millennium EHR data into a standardised cardiac analytics platform. The project implements a medallion architecture transforming raw EMR data into a research-ready CardiacAI data model, incorporating a rigorous 4-dimensional data quality framework (Completeness, Conformance, Plausibility, Timeliness) compatible with OMOP/OHDSI standards. Built on Snowflake with dbt, the solution features automated testing, privacy-preserving de-identification, comprehensive business rules governance, and interactive DQ dashboards –demonstrating end-to-end capability in transforming complex healthcare data into validated, research-ready datasets while maintaining clinical validity and regulatory compliance.

The DREAM project was a pilot study that mapped Sydney Local Health District (SLHD) Cerner data into the OMOP Common Data Model (OMOP v5.4) using the Cerner2Omop framework, and included de-identification of discharge summaries using DEFT (De-identify Free Text) developed at UNSW, under SLHD Human Research Ethics Committee approval. Working from selected extracts provided as files, the work covered conversion across core clinical domains and included documented mapping notes/issues and summary statistics, alongside structured evaluation using the OHDSI Data Quality Dashboard (DQD) to assess conformance, plausibility and completeness and to guide remediation priorities.

The mapped OMOP dataset and de-identified discharge summaries were made available via UNSW ERICA, with secure transfer back to the LHD supported where needed. The report also sets out practical next steps typical of real-world OMOP programs: improve completeness/coverage through direct data-lake extraction, perform mapping and de-identification within the LHD’s governed cloud environment, explore LLM-assisted ontology mapping with clinical oversight, and extend to OMOP NLP tables (NOTE_NLP) where appropriate. Methods are described in the associated publications (Cerner2Omop paper, governed access patterns paper, and DEFT paper).

Selected publications and related outputs (each reflects a delivered capability relevant to OMOP conversion and OHDSI enablement):

  • ETL framework for conversion to OMOP (Cerner2Omop / OMOP ETL) — Designed and tested a metadata-driven ETL approach (YAML-defined mapping logic compiled to SQL) to make mapping rules readable, reviewable, and maintainable across complex source systems. Cerner2Omop paper
  • Governed access patterns for OMOP-converted EMR data — Summarised practical governance and operating-model considerations (pseudonymisation, quality assessment, and secure cross-site analysis patterns) for using OMOP in real-world settings. Governed access patterns paper (PMC mirror)
  • De-identification of discharge summaries (DEFT) — Developed an end-to-end approach for de-identifying Australian hospital discharge summaries using an ensemble of deep learning models (enabling privacy-preserving use of clinical text alongside structured OMOP data). DEFT paper
  • NLP for turning clinical text into structured fields — Reviewed NLP methods for extracting information from clinical text to populate clinical registries, informing how to incorporate unstructured sources into reproducible data pipelines. Clinical NLP review

Team

A UNSW-based multidisciplinary team spanning clinical expertise, data engineering, informatics, research, and analytics.

Blanca Gallego Luxán

LinkedIn |  UNSW profile

Blanca is a Professor of Clinical Artificial Intelligence at UNSW and founder of the Clinical AI Honours Program at UNSW Medicine & Health. Trained as a physicist, she completed a PhD in climate modelling at UCLA. Her research focuses on the application of AI to improve healthcare decision-making and patient outcomes. She is the technical lead and co-founder of CardiacAI, a research initiative focused on using artificial intelligence and large-scale health data to improve cardiovascular care. This initiative has built a live cardiovascular database using de-identified EMR data from NSW health districts linked with long-term hospitalisation and mortality outcomes. Blanca leads the NSW node on a national initiative to make Australian hospital EMR data research-read by harmonizing it into the OMOP Common Data Model, enabling secure, federated, large-scale health research while keeping patient data private at source. She is also a member of the Australian Council of Senior Academic Leaders in Digital Health and the TGA advisory panel on medical devices.

Sam Arvan (Contact)

LinkedIn |  UNSW profile |  Email

Dr Sam Arvan is a Senior Data Professional at the Centre for Big Data Research in Health (CBDRH), UNSW Sydney, specialising in governed health data platforms and pipelines using Snowflake, dbt, Python, and SQL to transform raw EMR and administrative data into analysis-ready assets. His work spans data mapping, ETL validation, automated data quality testing and monitoring, as well as privacy-preserving workflows for clinical text, including the evaluation and secure deployment of free-text de-identification models. His expertise is directly aligned with OMOP conversion and OHDSI enablement.

Team expertise

  • OMOP conversion specialists: end-to-end delivery across profiling, mapping, ETL, and quality
  • Multidisciplinary delivery: clinical + informatics + engineering + research + data science
  • Data modelling & data engineering: governed pipelines and reproducible releases
  • NLP expertise: clinical NLP methods for extracting structured fields from free text (registry population) — Clinical NLP review
  • AI in healthcare: development, evaluation, and implementation of clinical AI/ML in real-world settings
  • Business Intelligence: translating raw data into actionable insights and decision-ready dashboarding and reporting

Get in touch

Contact person: Sam Arvan

What helps us speed up scoping:

  • Source system(s) and data model (EMR/EDW/registries)
  • Approximate scale (tables, years, sites)
  • Governance constraints (TRE, on-prem, approvals)
  • Target use cases (research, surveillance, network participation)

FAQs

  • Not always. We can work with different access models depending on governance constraints (e.g., secure environments, restricted extracts, or privacy-preserving workflows). The engagement will specify the minimum required access and controls.

  • We use explicit mapping specifications, stakeholder review (including clinical input where needed), and structured data quality checks. Findings and limitations are documented in the handover pack.

  • We work with standard OMOP/OHDSI ecosystem tooling (e.g., profiling and mapping specification tools, vocabulary management, and data quality assessment). Specific tool choices depend on your environment and constraints.

  • A typical engagement moves through discovery → profiling → mapping spec → ETL build → data quality remediation → enablement and handover. We can also deliver smaller advisory modules.

  • Deliverables, documentation, and operational runbooks are designed for handover so your organisation can maintain and evolve the conversion. Specific IP and reuse terms can be agreed up front.