So far AI and machine learning have delivered some superior results in specialized applications (e.g. Chess, Go, Poker, video games). Medical diagnostics is actually very suitable for the application of machine learning and computer vision.
The authors are basically accusing AI researchers of cheating by using held out test data for training their models! This is actually quite an accusation coming from two associate professors at the Arizona State University. Even if researchers cheated in such a way, this can be countered very easily by creating new datasets to test the models again or by third parties training the model again without any of the test data.
Then, the author s resort to the very specious and most of the times suspicious "no silver bullet" argument! Of course, there are no silver bullets in live! What is the point!
"However, mistakes by AI models that support doctors’ clinical decisions can mean life or death. Therefore, it’s critical that we understand how well these models work before deploying them. Published reports of this technology currently paint a too-optimistic picture of its accuracy, which at times translates to sensationalized stories in the press. Media are rife with discussions of algorithms that can diagnose early Alzheimer’s disease with up to 74 percent accuracy or that are more accurate than clinicians. The scientific papers detailing such advances may become foundations for new companies, new investments and lines of research, and large-scale implementations in hospital systems. In most cases, the technology is not ready for deployment.
Here’s why: As researchers feed data into AI models, the models are expected to become more accurate, or at least not get worse. However, our work and the work of others has identified the opposite, where the reported accuracy in published models decreases with increasing data set size. ...
And why does the research say that reported accuracy decreases with increasing data set size? Ideally, the held-out data are never seen by the scientists until the model is completed and fixed. However, scientists may peek at the data, sometimes unintentionally, and modify the model until it yields a high accuracy, a phenomenon known as data leakage. By using the held-out data to modify their model and then to test it, the researchers are virtually guaranteeing the system will correctly predict the held-out data, leading to inflated estimates of the model’s true accuracy. ...
Unfortunately, there is no silver bullet for reliably validating clinical AI models. Every tool and every clinical population are different. ..."
And why does the research say that reported accuracy decreases with increasing data set size? Ideally, the held-out data are never seen by the scientists until the model is completed and fixed. However, scientists may peek at the data, sometimes unintentionally, and modify the model until it yields a high accuracy, a phenomenon known as data leakage. By using the held-out data to modify their model and then to test it, the researchers are virtually guaranteeing the system will correctly predict the held-out data, leading to inflated estimates of the model’s true accuracy. ...
Unfortunately, there is no silver bullet for reliably validating clinical AI models. Every tool and every clinical population are different. ..."
Credits: Last Week in AI
No comments:
Post a Comment