core schema🔗
Available on backends: TPP, EMIS
This schema defines the core tables and columns which should be available in any backend providing primary care data, allowing dataset definitions written using this schema to run across multiple backends.
from ehrql.tables.core import (
clinical_events,
medications,
ons_deaths,
patients,
practice_registrations,
)
many rows per patient
clinical_events🔗
Each record corresponds to a single clinical or consultation event for a patient.
Note that event codes do not change in this table. If an event code in the coding system becomes inactive, the event will still be coded to the inactive code. As such, codelists should include all relevant inactive codes.
Example ehrQL usage of clinical_events
many rows per patient
medications🔗
The medications table provides data about prescribed medications in primary care.
Prescribing data, including the contents of the medications table are standardised across clinical information systems such as SystmOne (TPP). This is a requirement for data transfer through the Electronic Prescription Service in which data passes from the prescriber to the pharmacy for dispensing.
Medications are coded using dm+d codes. The medications table is structured similarly to the clinical_events table, and each row in the table is made up of a patient identifier, an event (dm+d) code, and an event date. For this table, the event refers to the issue of a medication (coded as a dm+d code), and the event date, the date the prescription was issued.
Factors to consider when using medications data🔗
Depending on the specific area of research, you may wish to exclude medications in particular periods. For example, in order to ensure medication data is stable following a change of practice, you may want to exclude patients for a period after the start of their practice registration . You may also want to exclude medications for patients for a period prior to their leaving a practice. Alternatively, for research looking at a specific period of interest, you may simply want to ensure that all included patients were registered at a single practice for a minimum time prior to the study period, and were registered at the same practice for the duration of the study period.
Examples of using ehrQL to calculation such periods can be found in the documentation on how to use ehrQL to answer specific questions using the medications table
one row per patient
ons_deaths🔗
Registered deaths
Date and cause of death based on information recorded when deaths are certified and registered in England and Wales from February 2019 onwards. The data provider is the Office for National Statistics (ONS). This table is updated approximately weekly in OpenSAFELY.
This table includes the underlying cause of death and up to 15 medical conditions
mentioned on the death certificate. These codes (cause_of_death_01
to
cause_of_death_15
) are not ordered meaningfully.
More information about this table can be found in following documents provided by the ONS:
- Information collected at death registration
- User guide to mortality statistics
- How death registrations are recorded and stored by ONS
In the associated database table ONS_Deaths,
a small number of patients have multiple registered deaths.
This table contains the earliest registered death.
The ehrql.tables.raw.core.ons_deaths
table contains all registered deaths.
Warning
There is also a lag in ONS death recording caused amongst other things by things like autopsies and inquests delaying reporting on cause of death. This is evident in the OpenSAFELY historical database coverage report
Example ehrQL usage of ons_deaths
-
date
🔗
date
-
Patient's date of death.
-
underlying_cause_of_death
🔗
ICD-10 code
-
Patient's underlying cause of death.
-
cause_of_death_01
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_02
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_03
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_04
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_05
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_06
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_07
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_08
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_09
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_10
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_11
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_12
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_13
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_14
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_15
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_is_in(codelist)
🔗
-
Match
codelist
against theunderlying_cause_of_death
field and all 15 separatecause_of_death
fields.This method evaluates as
True
if any code in the codelist matches any of these fields.View method definition
columns = [ "underlying_cause_of_death", *[f"cause_of_death_{i:02d}" for i in range(1, 16)], ] conditions = [getattr(ons_deaths, column).is_in(codelist) for column in columns] return functools.reduce(operator.or_, conditions)
one row per patient
patients🔗
Patients in primary care.
Representativeness🔗
You can find out more about the representativeness of these data in the OpenSAFELY-TPP backend in:
The OpenSAFELY Collaborative, Colm D. Andrews, Anna Schultze, Helen J. Curtis, William J. Hulme, John Tazare, Stephen J. W. Evans, et al. 2022. "OpenSAFELY: Representativeness of Electronic Health Record Platform OpenSAFELY-TPP Data Compared to the Population of England." Wellcome Open Res 2022, 7:191. https://doi.org/10.12688/wellcomeopenres.18010.1
Orphan records🔗
If a practice becomes aware that a patient has moved house, then the practice deducts, or removes, the patient's records from their register. If the patient doesn't register with a new practice within a given amount of time (normally from four to eight weeks), then the patient's records are permanently deducted and are orphan records. There are roughly 1.6 million orphan records.
Recording of death in primary care🔗
In England, it is the statutory duty of the doctor who had attended in the last illness to complete a medical certificate of cause of death (MCCD). ONS death data are considered the gold standard for identifying patient deaths because they are based on these MCCDs.
There is generally a lag between the death being recorded in ONS data and it
appearing in the primary care record, but the coverage or recorded death is almost
complete and the date of death is usually reliable when it appears. There is
also a lag in ONS death recording (see ons_deaths
below
for more detail). You can find out more about the accuracy of date of death
recording in primary care in:
Gallagher, A. M., Dedman, D., Padmanabhan, S., Leufkens, H. G. M. & de Vries, F 2019. The accuracy of date of death recording in the Clinical Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations. Pharmacoepidemiol. Drug Saf. 28, 563–569. https://doi.org/10.1002/pds.4747
By contrast, cause of death is often not accurate in the primary care record so we don't make it available to query here.
Example ehrQL usage of patients
-
age_on(date)
🔗
-
Patient's age as an integer, in whole elapsed calendar years, as it would be on the given date.
This method takes no account of whether the patient is alive on the given date. In particular, it may return negative values if the given date is before the patient's date of birth.
View method definition
return (date - patients.date_of_birth).years
-
is_alive_on(date)
🔗
-
Whether a patient is alive on the given date, based on the date of death recorded in their primary care record. NB this is only based on the primary care record. Please see the section above about the accuracy of death data.
If the date provided is before a person was born, then this helper function will actually return True, despite the person not being alive yet. For most research this is likely the expected behaviour.
View method definition
return patients.date_of_death.is_after(date) | patients.date_of_death.is_null()
-
is_dead_on(date)
🔗
-
Whether a patient has a date of death in their primary care record before the given date.
A person is classed as dead if the date provided is after their death date.
View method definition
return patients.date_of_death.is_not_null() & patients.date_of_death.is_before(date)
many rows per patient
practice_registrations🔗
Each record corresponds to a patient's registration with a practice.
Example ehrQL usage of practice_registrations
-
for_patient_on(date)
🔗
-
Return each patient's practice registration as it was on the supplied date.
Where a patient is registered with multiple practices we prefer the most recent registration and then, if there are multiple of these, the one with the longest duration. If there's still an exact tie we choose arbitrarily based on the practice ID.
View method definition
spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where( practice_registrations.end_date < date ) ordered_regs = spanning_regs.sort_by( practice_registrations.start_date, practice_registrations.end_date, practice_registrations.practice_pseudo_id, ) return ordered_regs.last_for_patient()
-
exists_for_patient_on(date)
🔗
-
Returns whether a person was registered with a practice on the supplied date.
NB. The implementation currently uses
spanning()
. It would also have been valid to implement aspractice_registrations.for_patient_on(date).exists_for_patient()
, but for internal reasons that is less efficient.View method definition
return practice_registrations.spanning(date, date).exists_for_patient()
-
spanning(start_date, end_date)
🔗
-
Filter registrations to just those spanning the entire period between
start_date
andend_date
.View method definition
return practice_registrations.where( practice_registrations.start_date.is_on_or_before(start_date) & (practice_registrations.end_date.is_after(end_date) | practice_registrations.end_date.is_null()) )