IS THERE no problem artificial intelligence can’t tackle? Methods such as deep learning are finding uses in everything from algorithms that recommend what you should purchase next to ones that predict someone’s voting habits. The result is that AI has developed a somewhat mystical reputation as a tool that can digest many different types of data and accurately predict many different outcomes, an ability that could be of particular use for solving previously impenetrable problems within healthcare.
However, AI is no panacea. Too often, it is turned to too quickly and in an impulsive way, resulting in claims that it works when it doesn’t. This has become increasingly apparent during the covid-19 pandemic, as many AI researchers try their hand at healthcare – without much success.
It is no wonder many people think healthcare is a promising area for AI as hospitals generate lots of data, which deep learning relies on. The partnership has already borne fruit, with AI systems able to help identify cancer earlier and better predict which treatments people will respond to.
In the initial stages of the pandemic, there was a deluge of publications attempting to do the same for covid-19. In particular, there are hundreds of papers claiming that machine-learning techniques can use chest scans to quickly diagnose covid-19 and to accurately predict how patients will fare.
My colleagues and I looked at every such paper that was published between 1 January 2020 and 3 October 2020 and found that none of them produced tools that would be good enough to use in a clinical setting (Nature Machine Intelligence, doi.org/gjkjvw). Something has gone seriously wrong when more than 300 papers are published that have no practical benefit.
Our review found that there were often issues at every stage of the development of the tools mentioned in the literature. The papers themselves often didn’t include enough detail to reproduce their results.
Another issue was that many of the papers introduced significant biases with the data collection method, the development of the machine-learning system or the analysis of the results. For example, a significant proportion of systems designed to diagnose covid-19 from chest X-rays were trained on adults with covid-19 and children without it, so their algorithms were more likely to be detecting whether an X-ray came from an adult or a child than if that person had covid-19.
Though authors may have been motivated by the desire to develop models that could help people, in their haste, many of the publications didn’t take into account how, or whether, these models could pass regulation requirements to be used in practice.
The papers also suffer from publication bias towards positive results. For example, imagine a theoretical research group that carefully develops a machine-learning model to predict covid-19 from a chest X-ray and it finds that this doesn’t outperform standard tests for the illness. This finding isn’t of interest to many journals and is hard to communicate. It is far easier to develop a model with poor rigour that gives excellent performance and publish this.
While machine learning has great promise to find solutions for many healthcare problems, it must be done just as carefully as when we develop other tools in healthcare.
If we take as much care in developing machine-learning models as we do with clinical trials, there is no reason why these algorithms won’t become part of routine clinical use and help us all push towards the ideal of more personalised treatment pathways. But there is no rushing that.
More on these topics: