机构:[1]School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.[2]Tencent Sustainable Social Value Inclusive Health Lab, Tencent, Beijing, China.[3]Department of Gynecologic Oncology, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, Shenyang, Liaoning Province, China.[4]Department of Diagnosis and Treatment for Cervical Diseases, Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China.[5]Department of Gynecology, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, Guangdong Province, China.南方医科大学深圳医院深圳市妇幼保健院深圳市康宁医院深圳医学信息中心[6]Department of Gynecology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, China.[7]Wuxi Maternity and Child Health Care Hospital, Wuxi School of Medicine, Jiangnan University, Wuxi, Jiangsu Province, China.
To explore the feasibility of using large language models (LLMs) to generate multiple-choice questions (MCQs) for cervical cancer control education and compare them with those created by clinicians.GPT-4o and Baichuan4 generated 40 MCQs each with iteratively refined prompts. Clinicians generated 40 MCQs for comparison. 120 MCQs were evaluated by 12 experts across five dimensions (correctness, clarity and specificity, cognitive level, clinical relevance, explainability) using a 5-point Likert scale. Difficulty and discriminatory power were tested by practitioners. Participants were asked to identify the source of each MCQ.Automated MCQs were similar to clinician-generated ones in most dimensions. However, clinician-generated MCQs had a higher cognitive level (4.00±1.08) than those from GPT-4o (3.68±1.07) and Baichuan4 (3.7±1.13). Testing with 312 practitioners revealed no significant differences in difficulty or discriminatory power among clinicians (59.51±24.50, 0.38±0.14), GPT-4o (61.89±25.36, 0.30±0.19), and Baichuan4 (59.79±26.25, 0.33±0.15). Recognition rates for LLM-generated MCQs ranged from 32% to 50%, with experts outperforming general practitioners in identifying the question setters.LLMs can generate MCQs comparable to clinician-generated ones with engineered prompts, though clinicians outperformed in cognitive level. LLM-assisted MCQ generation could enhance efficiency but requires rigorous validation to ensure educational quality.
基金:
This study was supported by CAMS Innovation Fund for
Medical Sciences (CIFMS 2021-I2M-1-004), Tencent
Sustainable Social Value Inclusive Health Lab and through
the ChongQing Tencent Sustainable Development
Foundation “Comprehensive Prevention and Control
Demonstration Project for Eliminating Cervical Cancer and
Breast Cancer in Low Health Resource Areas of China”
(Project Number: SD20240904145730), and Sanming
Project of Medicine in Shenzhen (No. SZSM202211032).
语种:
外文
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类|3 区教育学
小类|2 区卫生保健与服务3 区学科教育
最新[2025]版:
大类|3 区教育学
小类|2 区卫生保健与服务3 区学科教育
第一作者:
第一作者机构:[1]School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
共同第一作者:
通讯作者:
推荐引用方式(GB/T 7714):
Chen Mingyang,Ma Jiayi,Cui Xiaoli,et al.Advancing medical education in cervical cancer control with large language models for multiple-choice question generation[J].Medical Teacher.2025,1-11.doi:10.1080/0142159X.2025.2513419.
APA:
Chen Mingyang,Ma Jiayi,Cui Xiaoli,Dai Qianling,Hu Haiyan...&Qiao Youlin.(2025).Advancing medical education in cervical cancer control with large language models for multiple-choice question generation.Medical Teacher,,
MLA:
Chen Mingyang,et al."Advancing medical education in cervical cancer control with large language models for multiple-choice question generation".Medical Teacher .(2025):1-11