OBJECTIVE: To describe a novel England-wide electronic health record (EHR) resource enabling whole population research on covid-19 and cardiovascular disease while ensuring data security and privacy and maintaining public trust.

DESIGN: Data resource comprising linked person level records from national healthcare settings for the English population, accessible within NHS Digital's new trusted research environment.

SETTING: EHRs from primary care, hospital episodes, death registry, covid-19 laboratory test results, and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular, and covid-19 vaccination data.

PARTICIPANTS: 54.4 million people alive on 1 January 2020 and registered with an NHS general practitioner in England.

MAIN MEASURES OF INTEREST: Confirmed and suspected covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack and incident myocardial infarction) and all cause mortality between 1 January and 31 October 2020.

RESULTS: The linked cohort includes more than 96% of the English population. By combining person level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. Among 53.3 million people with no previous diagnosis of stroke or transient ischaemic attack, 98 721 had a first ever incident stroke or transient ischaemic attack between 1 January and 31 October 2020, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.2 million people with no previous diagnosis of myocardial infarction, 62 966 had an incident myocardial infarction during follow-up, of which 8% were recorded only in primary care and 12% only in death registry records. A total of 959 470 people had a confirmed or suspected covid-19 diagnosis (714 162 in primary care data, 126 349 in hospital admission records, 776 503 in covid-19 laboratory test data, and 50 504 in death registry records). Although 58% of these were recorded in both primary care and covid-19 laboratory test data, 15% and 18%, respectively, were recorded in only one.

CONCLUSIONS: This population-wide resource shows the importance of linking person level data across health settings to maximise completeness of key characteristics and to ascertain cardiovascular events and covid-19 diagnoses. Although this resource was initially established to support research on covid-19 and cardiovascular disease to benefit clinical care and public health and to inform healthcare policy, it can broaden further to enable a wide range of research.


This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.

Cite as

Sudlow, C., Wood, A., Denholm, R., Hollings, S., Cooper, J., Ip, S., Walker, V., Denaxas, S., Akbari, A., Banerjee, A., Whiteley, W., Lai, A. & Sterne, J. 2021, 'Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource', The BMJ, 373(8289), article no: n826. https://doi.org/10.1136/bmj.n826

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 26 October 2023
Was this page helpful?