Abstract

Social media platforms, like Twitter, are increasingly used by billions of people internationally to share information. As such, these platforms contain vast volumes of real-time multimedia content about the world, which could be invaluable for a range of tasks such as incident tracking, damage estimation during disasters, insurance risk estimation, and more. By mining this real-time data, there are substantial economic benefits, as well as opportunities to save lives. Currently, the COVID-19 pandemic is attacking societies at an unprecedented speed and scale, forming an important use-case for social media analysis. However, the amount of information during such crisis events is vast and information normally exists in unstructured and multiple formats, making manual analysis very time consuming. Hence, in this paper, we examine how to extract valuable information from tweets related to COVID-19 automatically. For 12 geographical locations, we experiment with supervised approaches for labelling tweets into 7 crisis categories, as well as investigated automatic priority estimation, using both classical and deep learned approaches. Through evaluation using the TREC-IS 2020 COVID-19 datasets, we demonstrated that effective automatic labelling for this task is possible with an average of 61% F1 performance across crisis categories, while also analysing key factors that affect model performance and model generalizability across locations.

Cite as

Long, Z. & McCreadie, R. 2021, 'Automated Crisis Content Categorization for COVID-19 Tweet Streams', 18th International Conference on Information Systems for Crisis Response and Management, 23rd -26th May 2021, pp. 667-678. http://eprints.gla.ac.uk/252696/

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 16 June 2022
Was this page helpful?