Record linkage

Record linkage in the NSW-CDS
An established state population cohort of 93,118 children is being followed from birth into early adulthood via successive waves of record linkage (see Figure below) to determine risk and protective factors for health, educational, social, child protection and criminology outcomes in adolescence and adulthood.
Wave 1
The second record linkage (wave 2) provides information about the same cohort of children from birth to 12-13 years of age, as well as children who completed the Middle Childhood Survey in 2015 (aged 11-12 years). This second linkage is supplemented by the inclusion of additional data collections, including the Middle Childhood Survey, alongside other information about physical health, mental health and education, child protection and criminal justice system contacts, for the children and their parents. Conducted in 2016, statistical analyses and dissemination activities using wave 2 data are currently underway.
Wave 2
The first record linkage (wave 1) for the NSW-CDS was intended to provide information about the early childhood years (from birth to 5 years) for children who were assessed using the Australian Early Development Census as they started school in 2009 (aged 5-6 years). The record linkage brought together information about physical and mental health, education, child protection and criminal justice system contacts, for the children and their parents, in a way that protects the anonymity of all people involved. This linkage was conducted during 2013-2014, and statistical analysis and dissemination activities are currently being finalised.
Wave 3
The next record linkage (wave 3) is proposed for completion in late 2020, to provide information about the same cohort of children from birth to 15-16 years of age. In addition to expanding the longitudinal data by three years, this linkage will comprise the same record sets as have been brought together in previous record linkages, together with the addition of Commonwealth data sets (e.g., Medicare records for GP visits). This will increase the level information obtained about physical and mental health contacts since many common ailments seen by GPs would not have been included in previous record linkages that were based on hospital contact data, which generally ascertain more severe health conditions.
-
Record linkage allows researchers and policy makers to study trends and patterns in whole population groups by bringing together routinely collected information from different sources. When linkage is conducted, an individual’s records from different agencies are brought together by a third party and provided to the researchers in a way that protects anonymity (that is, without personally identifying any individual so that privacy and confidentiality are proteted). It provides a safe and secure way to bring together relevant information that enables researchers to examine relationships between events at a population (not an individual) level.
Because of the power provided by large population samples, record linkage is often used to answer research questions that cannot be studied accurately in smaller samples (where outcomes of interest may be rare, and therefore require large numbers of individuals for them to occur at all). Of equal importance, sometimes the use of administrative records can be useful when self-reported information (i.e., via in interviews or questionnaires) may bring problems of feasibility or accuracy. For example, people can’t always accurately remember events that happened a long time ago; instead, records that were made at the time of the event (e.g., at a child’s birth) are likely to be more accurate than trying to remember detailed information years later.
As an example, using record linkage, health services information for an entire population can be combined with information from other departments or agencies, such as education, to study questions that could not be answered accurately using any other method. For instance, researchers could see whether birth weight has an influence on children’s readiness to learn at school, or whether better social and emotional functioning at school entry is related to later scholastic achievement.
Record Linkage can also bring together a child’s records with those of his/her parents to provide information about important influences in the child’s life – for example, to find out how significant events in the parents’ lives (such as hospitalisation for serious illness or a court appearance), might impact on children’s health and wellbeing. Collection of information about these possible influences on the lives of children could not be gathered accurately for a population cohort in any other way. We are fortunate that the NSW Government has provided the infrastructure to enable this research to be undertaken in a way that protects the privacy of the people involved.
Below is a useful animated video from SA-NT DataLink explaining how data (record) linkage works, and how it can help to improve health policy.
-
Record Linkage is completed in two separate stages.
The first stage is conducted by the Centre for Health Record Linkage (CHeReL). The CHeReL matches together the identities (names, addresses, dates-of-birth, etc.) of children who appear in two or more databases, without having access to any of the information (data) that is held in those databases about the children (such as Middle Childhood Survey data or Australian Early Development Census data). After matching identities (but no other information), the CHeReL assigns a matching-code number for each child, and sends this to the organisations that hold the information (data). The matching-code number then replaces identifying information in the data files to enable anonymous information from different data files to be matched for each individual.
In the second stage, the data files labelled with a matching-code number (but with no identifying information) are sent to the research team. The team brings together (links) the data files from the different organisations on the basis of the matching-code number. In this way, the research team can combine the Middle Childhood Survey data with lots of other information (such as data files from health and education) without ever knowing the identities of the children. These processes are governed by strict privacy and confidentiality protection laws, so that each individual’s identity and data can be protected. You can see this description illustrated in the picture below.
-
Before any record linkage projects can be undertaken, multiple approvals are required to ensure that all Commonwealth and State privacy and security regulations and laws are upheld, that the research is of significant scientific merit (that is, it asks important and relevant questions), and that there is no risk that an individual may be personally identified at any stage of the research.
The following approvals must be obtained before a record linkage project can be undertaken:
- An authorised Human Research Ethics Committee (e.g., the NSW Population and Health Services Research Ethics Committee) must approve the linkage project, and will set an expiry date by which the record linkage project must be completed;
- Data Custodians within the organisation responsible for each dataset must approve the use of their data for the purposes of the linkage project;
- The Record Linkage Integrating Authority (e.g., in NSW this is the Centre for Health Record Linkage) must approve the feasibility of the linkage project;
- All government-owned data must be approved for linkage by the relevant Department’s own ethics committee.
-
In Australia, there are several third-party record linkage providers for data owned by State/Territory or Commonwealth Agencies. One of these linkage providers is the Centre for Health Record Linkage (CHeReL). This is the agency in NSW that provides linkage services for the NSW Child Development Study (NSW-CDS). It is important to note that the anonymous linkage of data requires cooperation from the CHeReL in direct liaison with Data Custodians within the organisations that are providing data for linkage, as follows.
For each research linkage study, the CHeReL receives a set of personal identifiers (information such as Name, Date of Birth, Postcode) from each Data Custodian for all the individual records to be linked. Using these sets of personal identifiers from different datasets, the CHeReL computer finds personal identifiers that ‘match’ (that is, belong to the same person), and generates a new, Project-specific Linkage ID code (‘Linkage IDs’) for these ‘matches’. The CHeReL pairs Linkage IDs with the unique ‘Record ID’ in each Data Custodian’s original file, and then sends these paired Linkage IDs and Record IDs (for all matched cases) back to the Data Custodians of each organisation. This is the end of the process for the CHeReL.
From here, each Data Custodian extracts the research data (the information needed by the researchers) from their database, and removes the Record IDs from the research data files, leaving only the research data and the Linkage IDs for provision to researchers. Each Data Custodian then sends the researcher the Linkage IDs and their corresponding record content (data) without any personal identifying information. This is the end of the process for the Data Custodian.
Next, the Research Team receives from each Data Custodian the research data and Project-specific Linkage IDs, and links together data from different sources (e.g., Education, Health, etc.) that relates to the same person, but without being able to identify anyone personally. For further information about how the CHeReL operates record linkage, click here.
This process of linkage ensures that the research databases for the NSW Child Development Study do not contain any information that could identify a child, parent/caregiver, or school. Only non-identifiable data is provided to the NSW-CDS Research Team.
-
The NSW-CDS team worked with a third party during the collection of the Middle Childhood Survey (MCS) questionnaire to ensure that the research team only receive the child’s responses (research data), while personal identifying information used in linkage processes were stored separately (held by separate organisations), so that no individual who participated in the MCS can ever be identified. This was done by coding the MCS data with a unique MCS-ID number.
Only the researchers named on the appropriate ethical approvals have access to the data collected during the MCS, and the University of New South Wales remains the Data Custodian for any future linkage of MCS data. On behalf of the University of New South Wales, the CHeReL holds only the MCS-ID and personal identifiers (without having any access to the MCS data).
As with all record linkage projects, the processes conducted by the third party linkage provider (the CHeReL) ensure that researchers cannot re-identify study participants, because the researchers do not hold the personal identifiers for the research participants – they access only the research data that is coded by the unique MCS-ID number (linked to other record sets via the Project-specific linkage ID that is generated by the CHeReL, as described in the section above: How is Record Linkage done to ensure that privacy is protected). Staff at the CHeReL will only access the personal identifiers associated with each MCS-ID number, but have no access to research data.