East Florida Division GME Research Day 2025

Can Residents Cheat the Unknown Slide Sessions by Using the Large Language Models? Which Model Should They Use: Chat-Gpt, Claude, Or Gemini?

Gul Emek Wymer, HCA HealthcareFollow
Susana Ferra, HCA HealthcareFollow

Download

Download Full Text (1.2 MB)

Download Abstract (200 KB)

Division

East Florida

Hospital

HCA Florida Westside Hospital

Specialty

Pathology

Document Type

Poster

Publication Date

2025

Keywords

artificial intelligence, AI, large language models, LLM, pathology, diagnosis

Disciplines

Diagnosis | Pathology

Abstract

Introduction: In recent years, artificial intelligence tools, such as large language models (LLMs) have expanded the potential for diagnostic medicine, including histopathology. This study aims to evaluate the diagnostic ability and utility of the publicly available large language models in predicting the accurate diagnosis of the unknown cases by using the images of the hematoxylin-eosin stained slides taken by a mobile phone and compare their performance with the residents’ performance.

Method: The twenty cases, including a variety of entities, were collected from teaching sets of non-HCA patients and public available domains, which are used for unknown slide sessions for residents. Three publicly available LLMs, Chat-GPT 4.0, Claude 3.5 Sonnet, and Gemini 1.5 Flash were used for generating the diagnosis of these H-E slide histology images, using a standard prompt. The same cases were evaluated blindly by four residents. The accuracy of the three LLMs were compared with each other and with the accuracy rate of the residents.

Results: The most accurate LLM was the Claude with an accuracy rate of 50%, followed by Gemini (40%) and Chat-GPT (35%). The highest accuracy rate of the LLMs (50%) was lower than the lowest accuracy rate of the residents (55%). The average accuracy rate of the LLMs was 41.66 % versus 67.5 % for residents.

Conclusions: The current LLMs are not sufficient for diagnostic use, and need to be improved for better diagnostic accuracy.

Original Publisher

HCA Healthcare Graduate Medical Education

Recommended Citation

Wymer, Gul Emek and Ferra, Susana, "Can Residents Cheat the Unknown Slide Sessions by Using the Large Language Models? Which Model Should They Use: Chat-Gpt, Claude, Or Gemini?" (2025). East Florida Division GME Research Day 2025. 4.
https://scholarlycommons.hcahealthcare.com/eastflorida2025/4

1188_WYMER-CAN RESIDENTS CHEAT_Gul Wymer.pdf (200 kB)
Abstract

Included in

Diagnosis Commons, Pathology Commons

COinS

East Florida Division GME Research Day 2025

Can Residents Cheat the Unknown Slide Sessions by Using the Large Language Models? Which Model Should They Use: Chat-Gpt, Claude, Or Gemini?

Division

Hospital

Specialty

Document Type

Publication Date

Keywords

Disciplines

Abstract

Original Publisher

Recommended Citation

Included in

Search

Quick Links

Contribute

Links

Resources

Contact

East Florida Division GME Research Day 2025

Can Residents Cheat the Unknown Slide Sessions by Using the Large Language Models? Which Model Should They Use: Chat-Gpt, Claude, Or Gemini?​

Authors

Files

Division

Hospital

Specialty

Document Type

Publication Date

Keywords

Disciplines

Abstract

Original Publisher

Recommended Citation

Included in

Share

Search

Quick Links

Contribute

Links

Resources

Contact

Can Residents Cheat the Unknown Slide Sessions by Using the Large Language Models? Which Model Should They Use: Chat-Gpt, Claude, Or Gemini?