Researchers at West Virginia University have found that artificial intelligence tools, including various ChatGPT models, can assist emergency room physicians in diagnosing diseases—but primarily when patients present with classic symptoms. The study, led by Gangqing Hu, assistant professor in the Department of Microbiology, Immunology and Cell Biology at the WVU School of Medicine, examined the performance of four ChatGPT models in evaluating physician exam notes and offering potential diagnoses.
Published in Scientific Reports, the research involved analyzing 30 public emergency department cases. The findings revealed that AI models demonstrated higher accuracy when the patients exhibited typical disease symptoms. In contrast, the tools showed limited effectiveness in more complex cases where symptoms deviated from the norm, such as pneumonia without fever. In these instances, none of the AI models tested provided accurate diagnoses.
The study evaluated the performance of GPT-3.5, GPT-4, GPT-4o, and the o1 series. When considering the top three diagnoses suggested by each model, no substantial improvement was found between newer and older versions. However, the accuracy of the top-ranked diagnosis improved by 15% to 20% in the newer versions compared to earlier ones.
Hu emphasized that current AI diagnostic tools rely heavily on internet-sourced data, which may lack sufficient examples of atypical cases. He suggested that including additional data types, such as medical imaging and laboratory results, could enhance the models’ diagnostic performance in future applications. Furthermore, he noted the importance of transparency in AI-generated reasoning to foster trust in clinical settings.
The study underscores the necessity of human oversight in AI-assisted diagnosis, especially for complex cases. It also lays the groundwork for further research into integrating diverse data and employing conversational AI frameworks that simulate multi-disciplinary collaboration.
Hu conducted the study in collaboration with postdoctoral fellow Jinge Wang, lab volunteer Kenneth Shue, and Li Liu of Arizona State University. The research was funded by the National Institutes of Health and the National Science Foundation. Future studies may explore the role of AI in supporting triage and treatment decisions by enhancing its explanatory capabilities.
Photo credits WVU Photo/Greg Ellis