General Surgery

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma

Shivam Pandya, HCA HealthcareFollow
Tamir E. Bresler, HCA HealthcareFollow
Tyler Wilson, HCA HealthcareFollow
Zin Htway, HCA HealthcareFollow
Manabu Fujita, HCA HealthcareFollow

Division

Far West

Hospital

Los Robles Hospital and Medical Center

Document Type

Manuscript

Publication Date

8-13-2024

Keywords

endocrine, resident education, surgical oncology, thyroid, AI

Disciplines

Endocrine System Diseases | Medicine and Health Sciences | Neoplasms | Surgery

Abstract

INTRODUCTION: Artificial Intelligence (AI) has emerged as a promising tool in the delivery of health care. ChatGPT-4.0 (OpenAI, San Francisco, California) and Llama 2 (Meta, Menlo Park, CA) have each gained attention for their use in various medical applications.

OBJECTIVE: This study aims to evaluate and compare the effectiveness of ChatGPT-4.0 and Llama 2 in assisting with complex clinical decision making in the diagnosis and treatment of thyroid carcinoma.

PARTICIPANTS: We reviewed the National Comprehensive Cancer Network® (NCCN) Clinical Practice Guidelines for the management of thyroid carcinoma and formulated up to 3 complex clinical questions for each decision-making page. ChatGPT-4.0 and Llama 2 were queried in a reproducible manner. The answers were scored on a Likert scale: 5) Correct; 4) correct, with missing information requiring clarification; 3) correct, but unable to complete answer; 2) partially incorrect; 1) absolutely incorrect. Score frequencies were compared, and subgroup analysis was conducted on

RESULTS: In total, 58 pages of the NCCN Guidelines® were analyzed, generating 167 unique questions. There was no statistically significant difference between ChatGPT-4.0 and Llama 2 in terms of overall score (Mann-Whitney U-test; Mean Rank = 160.53 vs 174.47,

CONCLUSION: ChatGPT-4.0 and Llama 2 demonstrate a limited but substantial capacity to assist with complex clinical decision making relating to the management of thyroid carcinoma, with no significant difference in their effectiveness.

Publisher or Conference

The American Surgeon

Recommended Citation

Pandya S, Bresler TE, Wilson T, Htway Z, Fujita M. Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma. Am Surg. Published online August 13, 2024. doi:10.1177/00031348241269430

Link to Full Text

Find in your library

COinS

General Surgery

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma

Division

Hospital

Document Type

Publication Date

Keywords

Disciplines

Abstract

Publisher or Conference

Recommended Citation

Search

Quick Links

Contribute

Resources

Contact

General Surgery

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma

Authors

Division

Hospital

Document Type

Publication Date

Keywords

Disciplines

Abstract

Publisher or Conference

Recommended Citation

Share

Search

Quick Links

Contribute

Resources

Contact