BACKGROUND: Obtaining accurate estimates of the risk of COVID-19-related death in the general population is challenging in the context of changing levels of circulating infection.

METHODS: We propose a modelling approach to predict 28-day COVID-19-related death which explicitly accounts for COVID-19 infection prevalence using a series of sub-studies from new landmark times incorporating time-updating proxy measures of COVID-19 infection prevalence. This was compared with an approach ignoring infection prevalence. The target population was adults registered at a general practice in England in March 2020. The outcome was 28-day COVID-19-related death. Predictors included demographic characteristics and comorbidities. Three proxies of local infection prevalence were used: model-based estimates, rate of COVID-19-related attendances in emergency care, and rate of suspected COVID-19 cases in primary care. We used data within the TPP SystmOne electronic health record system linked to Office for National Statistics mortality data, using the OpenSAFELY platform, working on behalf of NHS England. Prediction models were developed in case-cohort samples with a 100-day follow-up. Validation was undertaken in 28-day cohorts from the target population. We considered predictive performance (discrimination and calibration) in geographical and temporal subsets of data not used in developing the risk prediction models. Simple models were contrasted to models including a full range of predictors.

RESULTS: Prediction models were developed on 11,972,947 individuals, of whom 7999 experienced COVID-19-related death. All models discriminated well between individuals who did and did not experience the outcome, including simple models adjusting only for basic demographics and number of comorbidities: C-statistics 0.92-0.94. However, absolute risk estimates were substantially miscalibrated when infection prevalence was not explicitly modelled.

CONCLUSIONS: Our proposed models allow absolute risk estimation in the context of changing infection prevalence but predictive performance is sensitive to the proxy for infection prevalence. Simple models can provide excellent discrimination and may simplify implementation of risk prediction tools.


This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Cite as

The OpenSAFELY Collaborative, Williamson, E., Tazare, J., Bhaskaran, K., McDonald, H., Walker, A., Tomlinson, L., Wing, K., Bacon, S., Bates, C., Curtis, H., Forbes, H., Minassian, C., Morton, C., Nightingale, E., Mehrkar, A., Evans, D., Nicholson, B., Leon, D., Inglesby, P., Mackenna, B., Davies, N., DeVito, N., Drysdale, H., Cockburn, J., Hulme, W., Morley, J., Douglas, I., Rentsch, C., Mathur, R., Wong, A., Schultze, A., Croker, R., Parry, J., Hester, F., Harper, S., Grieve, R., Harrison, D., Steyerberg, E., Eggo, R., Diaz-Ordaz, K., Keogh, R., Evans, S., Smeeth, L. & Goldacre, B. 2022, 'Comparison of methods for predicting COVID-19-related death in the general population using the OpenSAFELY platform', Diagnostic and Prognostic Research, 6, article no: 6. https://doi.org/10.1186/s41512-022-00120-2

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 21 June 2023
Was this page helpful?