Guidelines published in Nature Methods, to ensure quality and reproducibility of predictive methods

An international group of scientists, including the ELIXIR Machine Learning Focus Group, developed a set of guidelines for better reporting standards for AI methods aiming to classify biomedical data.  Examples of such methods are machine learning predictors that try to identify, based on genetic and other data, whether someone suffers from a particular rare disease or predictive methods that aim to identify the drug to which a cancer patient would respond best.  The recommendations were published in the renowned journal Nature Methods.

Professor Tom Lenaerts, member of the ELIXIR Machine Learning Focus Group and current director of IB², the Interuniversity Institute for Bioinformatics of the Université libre de Bruxelles and the Vrije Universiteit Brussel: “The popularity of machine learning and deep learning nowadays gives the impression that novel AI tools can be quickly designed without much thought about the data and the actual objectives. Such is not the case. Inaccuracy is easily achieved when one does not have full understanding of the nature of the data and features used in a predictive method. In the medical and biological fields, more than half of the time should be spent on designing a high-quality data set and finding the right set of features to train the method.”  

The article published in Nature Methods provides a check list and recommendations for anyone aiming to build or publish a supervised classification method for the biological and medical sciences. The guidelines on what should be reported in scientific papers make that the form and quality of a novel method can be fully assessed, and reproducibility guarantees are met. Top journals reporting on novel AI predictive methods are suggested to incorporate these DOME guidelines to ensure that advancements in the biomedical AI field are held to the highest standards just as is the case for any other biomedical device.

Lenaerts: “Notwithstanding the benefits that novel predictive AI methods may bring to molecular or disease understanding and potentially patient care, they often suffer from reproducibility and clarity issues, and in worst case design and bias issues associated with the data and methods used in the predictor.  Inadequate explanations on the main parts of these methods will not only lead to distrust but will also block the transfer of the suggested approaches to clinics and thus patient care.  By adding the information requested in this paper to your own manuscript or in an online document that anyone can consult, it becomes possible to separate the wheat from the chaff and raise the standards for machine learning products in these domains to a higher level.  Following the DOME recommendations, people will find the AI solutions relevant and useful, and it will avoid another winter for all the highly exciting developments of ML and AI in the decades to come.”

Tom Lenaerts is director of IB², the Interuniversity Institute for Bioinformatics of the Université libre de Bruxelles and the Vrije Universiteit Brussel, co-head of the ULB Machine Learning Group (MLG) and partially affiliated with the Artificial Intelligence lab of the Vrije Universiteit Brussel.  Both MLG and the AI lab are members of the European AI network CLAIRE and the new AI institute for Common Good, FARI, in Brussels. Tom Lenaerts is part of the ELIXIR Machine Learning Focus Group and is named as such on the article. Affiliations with ULB and VUB and IB² are reported in the article. 

ELIXIR

ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. These resources include databases, software tools, training materials, cloud storage and supercomputers. The goal of ELIXIR is to coordinate these resources so that they form a single infrastructure. The Belgian node of ELIXIR, ELIXIR Belgium, is based in Ghent.  ULB, VUB and IB2 have been collaborating with the node, sharing access to resources and where possible collaborating in ELIXIR organised projects and events.

Article

Ian Walsh, Dmytro Fishman, Dario Garcia-Gasulla, Tiina Titma, Gianluca Pollastri, ELIXIR Machine Learning Focus Group, Jennifer Harrow, Fotis E. Psomopoulos & Silvio C. E. Tosatto DOME: recommendations for supervised machine learning validation in biology. Nat Methods (2021). https://doi.org/10.1038/s41592-021-01205-4