TY - CPAPER AU - Dong, Hang AU - Wang, Minhong AU - Zhang, Huayu AU - Casey, Arlene AU - Wu, Honghan PY - 2021 DA - November TI - KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification EP - 313 AB - Classifying scientific literature into an abstract set of topics requires leveraging various sources from the publication and external knowledge. In the BioCreative VII LitCovid track on COVID-19 literature multi-label topic annotation, we applied state-of-the-art deep learning based document classification models (BERT, variations of HAN, CNN, LSTM) and each with a different combination of metadata (title, abstract, keywords, and journal), knowledge sources, pre-trained embedding, and data augmentation techniques. Several ensemble techniques were then used to combine individual model outputs for synergized predictions. We showed that a class-specific average ensembling of the pre-trained and task-specific models achieved the best micro-F1 score in validation (90.31%) and testing (89.32%) sets in the experiments, beyond the medium (89.25%) and mean value (87.78%) of all 80 valid submissions. We summarize lessons learned from our work on this task PB - BioCreative UR - https://www.research.ed.ac.uk/en/publications/e9e6cc10-0b5a-44d4-b4cd-faca0e9a7f1e KW - Coronavirus (COVID-19) KW - Digital health and technology ER