高级检索
当前位置: 首页 > 详情页

Temporal evolution of large language models (LLMs) in oncology

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]Nanjing Med Univ, Southern Med Univ, Zhujiang Hosp,Affiliated Kangda Coll, Donghai Cty Peoples Hosp,Dept Oncol, Lianyungang 222000, Peoples R China [2]Naval Med Univ, Mil Med Univ 2, Changhai Hosp, Dept Urol, Shanghai, Peoples R China [3]Univ Hong Kong, Hong Kong, Peoples R China [4]Jinan Univ, Zhuhai Hosp, Zhuhai Peoples Hosp, Dept Joint Surg & Sports Med, Guangzhou, Guangdong, Peoples R China [5]Southern Med Univ, Zhujiang Hosp, Dept Oncol, Guangzhou 510282, Guangdong, Peoples R China [6]Shanghai Jiao Tong Univ, Sch Med, Shanghai Gen Hosp, Dept Urol, Shanghai, Peoples R China [7]Southern Med Univ, Nanfang Hosp, Dept Oncol, Guangzhou 510515, Peoples R China [8]South China Univ Technol, Affiliated Hosp 6, Canc Ctr, Sch Med, Foshan 528000, Peoples R China [9]Wenzhou Med Univ, Quzhou Peoples Hosp, Quzhou Affiliated Hosp, Hepatobiliary Surg Dept, Quzhou, Peoples R China [10]Qingdao Univ, Affiliated Hosp, Dept Urol, Qingdao, Peoples R China [11]Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Dept Radiol, Chengdu 610072, Peoples R China [12]Univ Hong Kong, Li Ka Shing Fac Med, Hong Kong, Peoples R China [13]Monash Univ, Sch Publ Hlth & Prevent Med, Melbourne, Vic 3000, Australia [14]Suzhou Ind Pk Monash Res Inst Sci & Technol, Suzhou 215000, Peoples R China [15]Anhui Med Univ, Coll & Hosp Stomatol, Key Lab Oral Dis Res Anhui Prov, Hefei 230032, Peoples R China [16]Sun Yat Sen Univ, Guangdong Prov Clin Res Ctr Canc, Canc Ctr, State Key Lab Oncol Southern China,Dept Urol, Guangzhou 510060, Peoples R China [17]Cent South Univ, Xiangya Hosp, Dept Neurosurg, Changsha, Hunan, Peoples R China [18]Cent South Univ, Xiangya Hosp, Natl Clin Res Ctr Geriatr Disorders, Changsha, Peoples R China [19]Fudan Univ, Zhongshan Hosp, Dept Intervent Radiol, Shanghai 200032, Peoples R China [20]Harbin Med Univ, Affiliated Hosp 1, Dept Urol, Harbin 150001, Peoples R China [21]Univ Hong Kong, Li Ka Shing Fac Med, Sch Clin Med, Hong Kong 999077, Peoples R China
出处:
ISSN:

关键词: Large language models Oncology Temporal analysis Performance evaluation Meta-analysis

摘要:
BackgroundLarge language models (LLMs) are increasingly being applied in healthcare; however, their performance in specialized fields, such as oncology, is subject to temporal factors, including knowledge decay and concept drift. The impact of these temporal dynamics on LLM question-answering accuracy in oncology remains inadequately evaluated. This study aims to systematically assess the temporal evolution of LLM accuracy in responding to oncology-related questions using real-world data.MethodWe systematically collected relevant literature through 2025 by searching LLM-related keywords in PubMed, Google Scholar, and Web of Science databases. The inclusion criteria were as follows: (1) cancer-related research; (2) clear and complete question descriptions; and (3) complete answers. The final sample (n = 23) contained 614 research questions, comprising subjective questions (n = 223) and multiple-choice questions (n = 391). Following randomization of responses generated by three LLMs (ChatGPT-3.5, ChatGPT-4, and Gemini), we evaluated their accuracy across different cancer categories using both original scoring criteria and Likert scale scoring methods. Data analysis was performed using R statistical software, employing random or fixed effects models to calculate pooled mean differences (MD) and relative risks (RR) with their 95% confidence intervals (CI).ResultsThe findings demonstrated that in both subjective and objective oncology assessments, ChatGPT-3.5 (subjective questions MD = -3.30; objective questions RR = 0.92) and ChatGPT-4 (subjective questions MD = -7.17; objective questions RR = 0.93) showed declining performance trends over time, while Gemini exhibited significant improvements over time (subjective questions MD = 11.48; objective questions RR = 1.15). Notably, ChatGPT-3.5's performance on subjective questions revealed a significant turning point between March 14, 2023, and April 26, 2023, shifting from initially superior performance on newer questions to inferior performance compared with original questions, with the performance gap progressively widening.ConclusionsOur meta-analysis reveals temporal performance degradation in ChatGPT-3.5 and ChatGPT-4, which contrasts with the consistent improvement observed in Gemini. These findings provide essential guidance for the evidence-based deployment of LLMs in oncology.

语种:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 2 区 医学
小类 | 2 区 医学:研究与实验
最新[2025]版:
大类 | 2 区 医学
小类 | 2 区 医学:研究与实验
JCR分区:
出版当年[2024]版:
Q1 MEDICINE, RESEARCH & EXPERIMENTAL
最新[2024]版:
Q1 MEDICINE, RESEARCH & EXPERIMENTAL

影响因子: 最新[2024版] 最新五年平均 出版当年[2024版] 出版当年五年平均 出版前一年[2024版]

第一作者:
第一作者机构: [1]Nanjing Med Univ, Southern Med Univ, Zhujiang Hosp,Affiliated Kangda Coll, Donghai Cty Peoples Hosp,Dept Oncol, Lianyungang 222000, Peoples R China
共同第一作者:
通讯作者:
通讯机构: [1]Nanjing Med Univ, Southern Med Univ, Zhujiang Hosp,Affiliated Kangda Coll, Donghai Cty Peoples Hosp,Dept Oncol, Lianyungang 222000, Peoples R China [21]Univ Hong Kong, Li Ka Shing Fac Med, Sch Clin Med, Hong Kong 999077, Peoples R China
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:65780 今日访问量:0 总访问量:5151 更新日期:2025-12-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 四川省肿瘤医院 技术支持:重庆聚合科技有限公司 地址:成都市人民南路四段55号