Abstract

Motivation: A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.

Results: In this study, a group of existing methods and their variations which are suitable for metaanalysis of gene lists are compared using simulated and real data. Simulated data was used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level, and a mix of unranked and ranked data using 20000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (NSCLC), and bacteria (macrophage apoptosis) was performed. We summarise our evaluation results in terms of a simple flowchart to select a ranking aggregation method for genomic data. We summarise the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content (MAIC) algorithm to infer heterogeneity of data quality across input data sets.

Availability: The code for simulated data generation and running edited version of algorithms:https://github.com/baillielab/comparison_of_RA_methods

Rights

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Cite as

Wang, B., Law, A., Regan, T., Parkinson, N., Cole, J., Russell, C., Dockrell, D., Gutmann, M. & Baillie, J. 2022, 'Systematic comparison of ranking aggregation methods for gene lists in experimental results', Bioinformatics, article no: btac621. https://doi.org/10.1093/bioinformatics/btac621

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 22 September 2022
Was this page helpful?