高级检索
当前位置: 首页 > 详情页

Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale

文献详情

资源类型:
Pubmed体系:
机构: [1]Bloomberg School of Public Health,Johns Hopkins University, MD, 21205, United States. [2]Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China. [3]Department of AI and IT, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, 310000, China. [4]School of Public Health, Hangzhou Medical College, Hangzhou, Zhejiang Province, 310053, China. [5]School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, Zhejiang Province, 310053, China. [6]School of Public Health, Southwest Medical University, Luzhou, Sichuan Province, 646000, China. [7]Clinical Research Center, Affiliated Hospital of Southwest Medical University, Liuzhou, 646000, China. [8]School of Medical Information and Engineering, Southwest Medical University, Luzhou, 646000, China. [9]Center for Medical Informatics, Advanced Institute of Clinical Medicine, Peking University, Beijing, 100191, China. [10]School of Medicine, Johns Hopkins University, Baltimore, MD, 21206, United States. [11]School of Nursing, Johns Hopkins University, Baltimore, MD, 21206, United States.
出处:
ISSN:

关键词: Chinese online health information DISCERN scale annotated corpus inter-rater reliability machine learning automated evaluation

摘要:
To develop the first comprehensive, standardized annotated corpus of Chinese online health information (OHI) using the full 16-item DISCERN instrument and to establish a reliable annotation process that supports automated quality assessment.We assembled 510 web-sourced articles on breast cancer, arthritis, and depression. All the articles were independently annotated by three trained raters using the DISCERN scale. Annotation followed a four-step workflow: data collection and preprocessing, rater training, iterative annotation, and quality control. Raters calibrated through consensus sessions and calibration articles. The Dawid-Skene model aggregated individual annotations into final consensus scores. Original five-point ratings were retained and binarized (scores 1-3 as low quality, 4-5 as high quality) to enable both fine-grained and coarse evaluation for machine learning.Initial annotation of a 60-article pilot produced low agreement (mean Krippendorff's α ≈ 0.022) due to subjective variability. Successive calibration exercises improved agreement markedly, culminating in a corpus-wide Krippendorff's α of 0.834. Consensus scores correlated strongly with individual rater scores, confirming annotation robustness. The dual-scale design yielded a relatively balanced distribution of labels across topics, with roughly equal representation of low- and high-quality articles, and preserved granularity for detailed DISCERN analysis.Our iterative calibration approach and consensus modeling effectively addressed the subjective ambiguity inherent in quality assessment. The binary and five-class labeling strategies facilitate flexible downstream applications, allowing automated systems to perform both broad filtering and nuanced quality differentiation. The high inter-rater reliability demonstrates that rigorous training and consensus methods can overcome domain-specific annotation challenges.The resulting Chinese OHI corpus, annotated via a standardized DISCERN framework and refined through iterative calibration, provides a robust benchmark for training and evaluating machine learning models. This resource lays the foundation for scalable, reliable automated quality assessment of OHI in Chinese public health settings.© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.

基金:
语种:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 2 区 医学
小类 | 2 区 计算机:信息系统 2 区 计算机:跨学科应用 2 区 卫生保健与服务 2 区 图书情报与档案管理 2 区 医学:信息
最新[2025]版:
大类 | 2 区 医学
小类 | 2 区 计算机:信息系统 2 区 计算机:跨学科应用 2 区 卫生保健与服务 2 区 图书情报与档案管理 2 区 医学:信息
第一作者:
第一作者机构: [1]Bloomberg School of Public Health,Johns Hopkins University, MD, 21205, United States.
通讯作者:
通讯机构: [7]Clinical Research Center, Affiliated Hospital of Southwest Medical University, Liuzhou, 646000, China. [8]School of Medical Information and Engineering, Southwest Medical University, Luzhou, 646000, China. [9]Center for Medical Informatics, Advanced Institute of Clinical Medicine, Peking University, Beijing, 100191, China.
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:65780 今日访问量:0 总访问量:5151 更新日期:2025-12-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 四川省肿瘤医院 技术支持:重庆聚合科技有限公司 地址:成都市人民南路四段55号