When the team tested various AI models on this data, they found that their performance was often overestimated by about 20 per cent on average.
"We propose that this overestimation is due to data acquisition bias (DAB), a natural occurrence when data for these studies is retrospectively collected from regular medical care," says Dr. Chris McIntosh, a scientist at TGHRI and senior author of the study.
"Generally speaking, AI might focus on irrelevant patterns in the data instead of what really matters for the task," adds Dr. McIntosh," who is also an assistant professor in the Department of Medical Biophysics at the University of Toronto (U of T).
"Different hospital departments may use different equipment or settings and have different patient acquisition conditions," says Dr. McIntosh, who also holds the Chair in Artificial Intelligence and Medical Imaging at the Joint Department of Medical Imaging at UHN and the Department of Medical Imaging at U of T. "These variations, which might be imperceptible to researchers and clinicians, can be detected by AI algorithms.
"When models are trained on this data, they might rely on these subtle differences — like how a medical image was taken — rather than the actual medical content, to make predictions."
An example of this bias is how patients suspected of having interstitial lung disease are often directed towards specific imaging techniques meant to confirm the diagnosis, while those without suspicion get more general scans.
The algorithm will appear highly accurate at the hospital the data was trained on, but when deployed for clinical care at another hospital with different scanners, the accuracy will drop, potentially putting patients at risk.
To address this issue, the researchers developed and proposed an open-source accuracy estimate called PEst that corrects for bias and provides more accurate estimates of a model's external performance.
"Our method, which corrects for hidden patterns and biases in the data, predicts models performance on new datasets with an accuracy margin within four per cent of the actual results," says Balagopal Unnikrishnan, doctoral student at TGHRI and co-first author of the study.
Given how crucial the accuracy of AI models is in health care, where recommendations can significantly impact patient outcomes, these findings will help enable safer and more widespread use of AI and support the development of new medical AI technology.
This study was a truly multidisciplinary effort across UHN to measure the impact of these biases in a diverse array of modalities and diseases.
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), The Princess Margaret Cancer Foundation, and UHN Foundation. Data for this study was supported by foundation investments in the Digital Cardiovascular Health Platform including UHN's Peter Munk Cardiac Centre and Ted Rogers Centre for Heart Research and MIRA through Cancer Digital Intelligence.
Dr. Chris McIntosh is an Assistant Professor in the Department of Medical Biophysics at the University of Toronto (U of T). He holds the Chair in Artificial Intelligence and Medical Imaging at the Joint Department of Medical Imaging at UHN and the Department of Medical Imaging at U of T.