OMOP Common Data Model
Routinely collected health data can support observational research, evaluation, and service improvement, however, it is rarely analysis-ready and seldom supports reproducibility. Data structures differ across systems, coding schemes vary, and governance constraints shape what can be accessed and where.
The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) community, is a widely used standard for organising and harmonising observational health data so it can be analysed and researched consistently and compared across settings.
We are a UNSW-based, multidisciplinary data services and consultancy team combining clinical, informatics, and data engineering expertise. We support health services, government agencies, and research/industry partners with OMOP conversion and OHDSI enablement in governed environments.
What we deliver
OMOP conversion is more than “just Extract Transform Load (ETL)”. It’s a cross-functional delivery problem: data engineering, clinical interpretation, terminology decisions, and governance working together with a clear need for traceability and validation.
Services
- Source assessment, profiling, and conversion planning: We assess source data structure, coding systems, clinical context, and access constraints to support feasibility assessment, profiling and OMOP conversion planning using WhiteRabbit, Rabbit in a Hat, or equivalent approaches.
- OMOP mapping and specification development: We develop a traceable version-controlled source-to-OMOP mapping specification with clear lineage, transformation logic, and review workflows to support clinical and stakeholder sign-off.
- Vocabulary and ontology support: We provide source-to-standard concept mapping, local terminology translation, and vocabulary governance support including planning for vocabulary updates and ongoing maintenance.
- ETL design, build, validation, and handover: We design and implement reproducible, versioned ETL workflows, support validation and deployment, and provide guidance for operational maintenance, refresh processes, and change control.
- Data quality assessment and remediation support: We assess data quality, identify limitations, and work with teams to prioritise remediation so the OMOP dataset is fit for its intended use.
- OHDSI tooling, cohort workflows, and enablement: We support OHDSI tooling setup and use, including data quality dashboard, WebAPI/ATLAS connectivity, cohort and phenotype workflows, training, and practical enablement so teams can work effectively with the converted data.
- Research, protocol, and downstream analytics support: Following OMOP implementation, we can support study design, protocol and analysis plan development, cohort definition, analysis execution, and reproducible reporting.
- Advisory, training, and operating model support: We provide workshops and advisory support across architecture, governance, and operating model design to help organisations sustain and use their OMOP environment effectively.
Experience and leadership
Our work sits at the intersection of health data engineering and applied clinical analytics, grounded in the OHDSI/OMOP ecosystem and shaped by the governed Australian health data.
We contribute to national and community efforts, including serving on the Steering Committee and leading the NSW node of AHDEN (Australian Health Data Evidence Network) (AHDEN overview) and engaging with OHDSI Australia (OHDSI Australia).
Team
A UNSW-based multidisciplinary team spanning clinical expertise, data engineering, informatics, research, and analytics.
Blanca Gallego Luxán
Blanca is a Professor of Clinical Artificial Intelligence at UNSW and founder of the Clinical AI Honours Program at UNSW Medicine & Health. Trained as a physicist, she completed a PhD in climate modelling at UCLA. Her research focuses on the application of AI to improve healthcare decision-making and patient outcomes. She is the technical lead and co-founder of CardiacAI, a research initiative focused on using artificial intelligence and large-scale health data to improve cardiovascular care. This initiative has built a live cardiovascular database using de-identified EMR data from NSW health districts linked with long-term hospitalisation and mortality outcomes. Blanca leads the NSW node on a national initiative to make Australian hospital EMR data research-read by harmonizing it into the OMOP Common Data Model, enabling secure, federated, large-scale health research while keeping patient data private at source. She is also a member of the Australian Council of Senior Academic Leaders in Digital Health and the TGA advisory panel on medical devices.
Sam Arvan (Contact)
LinkedIn | UNSW profile | Email
Dr Sam Arvan is a Senior Data Professional at the Centre for Big Data Research in Health (CBDRH), UNSW Sydney, specialising in governed health data platforms and pipelines using Snowflake, dbt, Python, and SQL to transform raw EMR and administrative data into analysis-ready assets. His work spans data mapping, ETL validation, automated data quality testing and monitoring, as well as privacy-preserving workflows for clinical text, including the evaluation and secure deployment of free-text de-identification models. His expertise is directly aligned with OMOP conversion and OHDSI enablement.
Team expertise
- OMOP conversion specialists: end-to-end delivery across profiling, mapping, ETL, and quality
- Multidisciplinary delivery: clinical + informatics + engineering + research + data science
- Data modelling & data engineering: governed pipelines and reproducible releases
- NLP expertise: clinical NLP methods for extracting structured fields from free text (registry population) — Clinical NLP review
- AI in healthcare: development, evaluation, and implementation of clinical AI/ML in real-world settings
- Business Intelligence: translating raw data into actionable insights and decision-ready dashboarding and reporting
Get in touch
Contact person: Sam Arvan
- Email: s.arvan@unsw.edu.au
What helps us speed up scoping:
- Source system(s) and data model (EMR/EDW/registries)
- Approximate scale (tables, years, sites)
- Governance constraints (TRE, on-prem, approvals)
- Target use cases (research, surveillance, network participation)
FAQs
-
Not always. We can work with different access models depending on governance constraints (e.g., secure environments, restricted extracts, or privacy-preserving workflows). The engagement will specify the minimum required access and controls.
-
We use explicit mapping specifications, stakeholder review (including clinical input where needed), and structured data quality checks. Findings and limitations are documented in the handover pack.
-
We work with standard OMOP/OHDSI ecosystem tooling (e.g., profiling and mapping specification tools, vocabulary management, and data quality assessment). Specific tool choices depend on your environment and constraints.
-
A typical engagement moves through discovery → profiling → mapping spec → ETL build → data quality remediation → enablement and handover. We can also deliver smaller advisory modules.
-
Deliverables, documentation, and operational runbooks are designed for handover so your organisation can maintain and evolve the conversion. Specific IP and reuse terms can be agreed up front.