ChatGPT, a large language model, shows promise in aiding clinical decision-making for breast cancer screening and breast pain imaging tests, according to researchers from Mass General Brigham. In a study comparing ChatGPT 3.5 and 4, both versions demonstrated high performance in selecting appropriate imaging tests based on patient scenarios. While ChatGPT should be viewed as an assistive tool, it could help optimize workflow, reduce administrative time, and decrease patient wait times. Collaboration with public health organizations is suggested to promote the availability of resources alongside AI-generated advice.
A recent study conducted by researchers from Mass General Brigham highlights the potential of ChatGPT, a large language model (LLM), in aiding clinical decision-making for breast cancer screening and breast pain imaging tests. The study, which was published in the Journal of the American College of Radiology, sought to assess ChatGPT 3.5 and 4’s ability to provide radiologic clinical decision support in a breast imaging pilot.
To address the current lack of studies on LLMs’ role in clinical decision-making, the researchers employed ChatGPT to assist in selecting appropriate imaging tests for a group of 21 fictional patient scenarios related to breast pain and breast cancer screening. The results of ChatGPT were compared to the American College of Radiology (ACR) guidelines, which are commonly used by radiologists to determine appropriate tests based on patient symptoms and medical history.
The performance of ChatGPT was assessed by measuring its adherence to the ACR guidelines in response to open-ended and ‘select all that apply’ (SATA) prompts. Both ChatGPT 3.5 and 4 exhibited high performance, with ChatGPT 4 outperforming 3.5 significantly.
For breast cancer screening, both versions of ChatGPT achieved an average score of 1.830 out of two on open-ended questions. In SATA prompts, ChatGPT 3.5 accurately suggested the appropriate imaging tests in 88.9 percent of cases, while ChatGPT 4 scored an impressive 98.4 percent.
Regarding breast pain, ChatGPT 3.5 achieved an average score of 1.125 on open-ended prompts, compared to ChatGPT 4’s score of 1.666. Additionally, ChatGPT 3.5 achieved a SATA score of 58.3 percent, while ChatGPT 4 scored 77.7 percent.
The study findings indicate that LLMs have the potential to assist primary care providers and referring clinicians in selecting the most suitable imaging tests for their patients. Dr. Marc D. Succi, the corresponding author of the study, emphasized ChatGPT’s role as a bridge between healthcare professionals and radiologists, functioning as a trained consultant to promptly recommend the appropriate imaging test. This could lead to reduced administrative time, optimized workflow, decreased burnout, and minimized patient confusion and wait times.
However, the research team emphasized that LLMs are assistive tools and should not be considered replacements for radiologists or healthcare professionals. The comparison made in the study focused on the ACR guidelines rather than the performance of radiologists. ChatGPT’s implementation in clinical settings would require a thorough evaluation concerning privacy and bias-related concerns. Additionally, fine-tuning the model with data from hospitals and research institutions is necessary to tailor it to specific patient populations.
This study represents the latest exploration of how ChatGPT and other LLMs can revolutionize healthcare. Previous research demonstrated that ChatGPT consistently provides evidence-based answers to public health inquiries. While the tool often offers advice instead of referring users to resources, it outperformed other AI assistants like Amazon Alexa and Google Assistant in multiple domains. The researchers suggested collaboration between AI companies and public health organizations to promote the availability of relevant resources alongside AI-generated advice.