高级检索
当前位置: 首页 > 详情页

Improving large language models for miRNA information extraction via prompt engineering

文献详情

资源类型:
Pubmed体系:
机构: [1]Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, [2]Operation Management Department, The First Affiliated Hospital of Soochow University, Suzhou, China [3]Department of Neurosurgery, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, China [4]Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, China [5]Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China [6]Department of Computer Science and Information Technologies, Elvina ˜ Campus, University of A Coruna, ˜ A Coruna, ˜ Spain g West China Tianfu Hospital, Sichuan University, Chengdu, Sichuan, China
出处:
ISSN:

关键词: MicroRNA Cancer Large language models Information extraction Datasets Prompt engineering

摘要:
Large language models (LLMs) demonstrate significant potential in biomedical knowledge discovery, yet their performance in extracting fine-grained biological information, such as miRNA, remains insufficiently explored. Accurate extraction of miRNA-related information is essential for understanding disease mechanisms and identifying biomarkers. This study aims to comprehensively evaluate the capabilities of LLMs in miRNA information extraction through diverse prompt learning strategies.Three high-quality miRNA information extraction datasets were constructed to support the benchmarking and training of generative LLMs, specifically Re-Tex, Re-miR and miR-Cancer. These datasets encompass three types of entities: miRNAs, genes, and diseases, along with their relationships. The accuracy and reliability of three LLMs, including GPT-4o, Gemini, and Claude, were evaluated and compared with traditional models. Different prompt engineering strategies were implemented to enhance the LLMs' performance, including baseline prompts, 5-shot Chain of Thought prompts, and generated knowledge prompts.The combination of optimized prompt strategies significantly improved overall entity extraction performance across both trained and untrained datasets. Generated knowledge prompting achieved the highest performance, with maximum F1 scores of 76.6 % for entity extraction and 54.8 % for relationship extraction. Comparative analysis indicated GPT-4o exhibited superior performance to Gemini, while Claude showed the lowest performance levels. Extraction accuracy varied considerably across entity types, with miRNA recognition achieving the highest performance and gene/protein identification demonstrating the lowest accuracy levels. Furthermore, binary relationship extraction accuracy was significantly lower than entity extraction performance. The three evaluated LLMs showed similarly limited capability in relationship extraction tasks, with no statistically significant differences observed between models. Finally, comparison with conventional computational methods revealed LLMs have not yet exceeded traditional methods in this specialized domain.This study established high-quality miRNA datasets to support information extraction and knowledge discovery. The overall performance of LLMs in this study proved limited, and challenges remain in processing miRNA-related information extraction. However, optimized prompt combinations can substantially improve performance. Future work should focus on further refinement of LLMs to accelerate the discovery and application of potential diagnostic and therapeutic targets.Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.

基金:
语种:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 2 区 医学
小类 | 2 区 计算机:跨学科应用 2 区 计算机:理论方法 2 区 工程:生物医学 3 区 医学:信息
最新[2025]版:
大类 | 2 区 医学
小类 | 2 区 计算机:跨学科应用 2 区 计算机:理论方法 2 区 工程:生物医学 3 区 医学:信息
第一作者:
第一作者机构: [1]Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, [2]Operation Management Department, The First Affiliated Hospital of Soochow University, Suzhou, China
共同第一作者:
通讯作者:
通讯机构: [1]Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University,
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:65780 今日访问量:0 总访问量:5151 更新日期:2025-12-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 四川省肿瘤医院 技术支持:重庆聚合科技有限公司 地址:成都市人民南路四段55号