Classifying scientific literature into an abstract set of topics requires leveraging various sources from the publication and external knowledge. In the BioCreative VII LitCovid track on COVID-19 literature multi-label topic annotation, we applied state-of-the-art deep learning based document classification models (BERT, variations of HAN, CNN, LSTM) and each with a different combination of metadata (title, abstract, keywords, and journal), knowledge sources, pre-trained embedding, and data augmentation techniques. Several ensemble techniques were then used to combine individual model outputs for synergized predictions. We showed that a class-specific average ensembling of the pre-trained and task-specific models achieved the best micro-F1 score in validation (90.31%) and testing (89.32%) sets in the experiments, beyond the medium (89.25%) and mean value (87.78%) of all 80 valid submissions. We summarize lessons learned from our work on this task


This content is not covered by the Open Government Licence. Please see source record or item for information on rights and permissions.

Cite as

Dong, H., Wang, M., Zhang, H., Casey, A. & Wu, H. 2021, 'KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification', Proceedings of the BioCreative VII Challenge Evaluation Workshop, pp. 310-313. https://www.research.ed.ac.uk/en/publications/e9e6cc10-0b5a-44d4-b4cd-faca0e9a7f1e

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 16 June 2022
Was this page helpful?