- Published
- 08 November 2021
- Conference item
KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification
- Authors
- Source
- Proceedings of the BioCreative VII Challenge Evaluation Workshop
Abstract
Classifying scientific literature into an abstract set of topics requires leveraging various sources from the publication and external knowledge. In the BioCreative VII LitCovid track on COVID-19 literature multi-label topic annotation, we applied state-of-the-art deep learning based document classification models (BERT, variations of HAN, CNN, LSTM) and each with a different combination of metadata (title, abstract, keywords, and journal), knowledge sources, pre-trained embedding, and data augmentation techniques. Several ensemble techniques were then used to combine individual model outputs for synergized predictions. We showed that a class-specific average ensembling of the pre-trained and task-specific models achieved the best micro-F1 score in validation (90.31%) and testing (89.32%) sets in the experiments, beyond the medium (89.25%) and mean value (87.78%) of all 80 valid submissions. We summarize lessons learned from our work on this task
Rights
This content is not covered by the Open Government Licence. Please see source record or item for information on rights and permissions.
Cite as
Dong, H., Wang, M., Zhang, H., Casey, A. & Wu, H. 2021, 'KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification', Proceedings of the BioCreative VII Challenge Evaluation Workshop, pp. 310-313. https://www.research.ed.ac.uk/en/publications/e9e6cc10-0b5a-44d4-b4cd-faca0e9a7f1e