Introduction: COVID-19 is commonly experienced as an acute illness, yet some people continue to have symptoms that persist for weeks, or months (commonly referred to as ‘long-COVID’). It remains unclear which patients are at highest risk of developing long-COVID. In this protocol, we describe plans to develop a prediction model to identify individuals at risk of developing long-COVID. Methods and analysis: We will use the national Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) platform, a population-level linked dataset of routine electronic healthcare data from 5.4 million individuals in Scotland. We will identify potential indicators for long-COVID by identifying patterns in primary care data linked to information from out-of-hours general practitioner encounters, accident and emergency visits, hospital admissions, outpatient visits, medication prescribing/dispensing and mortality. We will investigate the potential indicators of long-COVID by performing a matched analysis between those with a positive reverse transcriptase PCR (RT-PCR) test for SARS-CoV-2 infection and two control groups: (1) individuals with at least one negative RT-PCR test and never tested positive; (2) the general population (everyone who did not test positive) of Scotland. Cluster analysis will then be used to determine the final definition of the outcome measure for long-COVID. We will then derive, internally and externally validate a prediction model to identify the epidemiological risk factors associated with long-COVID. Ethics and dissemination: The EAVE II study has obtained approvals from the Research Ethics Committee (reference: 12/SS/0201), and the Public Benefit and Privacy Panel for Health and Social Care (reference: 1920-0279). Study findings will be published in peer-reviewed journals and presented at conferences. Understanding the predictors for long-COVID and identifying the patient groups at greatest risk of persisting symptoms will inform future treatments and preventative strategies for long-COVID.


This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Cite as

Daines, L., Mulholland, R., Vasileiou, E., Hammersley, V., Weatherill, D., Katikireddi, S., Kerr, S., Moore, E., Pesenti, E., Quint, J., Shah, S., Shi, T., Simpson, C., Robertson, C. & Sheikh, A. 2022, 'Deriving and validating a risk prediction model for long COVID-19: protocol for an observational cohort study using linked Scottish data', BMJ Open, 12(7), article no: e059385. http://dx.doi.org/10.1136/bmjopen-2021-059385

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 05 November 2022
Was this page helpful?