心电图(Electrocardiogram, ECG)由于其便利性和无创性而成为监测与心脏相关的医疗状况的有价值工具。当结合患者相关信息时,它使医疗专业人员能够检测潜在的健康风险。然而,这一过程需要相当的医学专业知识和人力资源。虽然深度学习方法成功缓解了这一问题,但它们并未能为医学专家提供熟练的临床建议和准确诊断的深刻见解。最近多模态语言大模型的进展已将它们的应用范围推广到了自然语言处理领域之外,但它们在心电图处理中的适用性仍然有很大的发展空间,而部分原因是缺乏文本与心电图的对齐数据。为此,我们开发了ECG-LM,这是第一个能够处理自然语言并理解心电信号的多模态大型语言模型,通过对齐文本和心电信号。此外,我们通过利用权威医学指南生成额外数据来解决缺乏文本与心电图对齐数据的限制。在各种下游任务上的结果表明,ECG-LM巧妙地捕捉了心电图的复杂特征。其零样本心血管疾病检测性能超过了现有的少样本解决方案,并且在与心电图相关的问题回答方面也表现出了优异的表现。最后,我们利用公开临床对话数据集以及基于医院真实临床数据构建的额外医疗对话数据集,对ECG-LM进行了微调,旨在提供更全面和个性化的用户体验。
The electrocardiogram (ECG) serves as a valuable tool for monitoring heart-related medical conditions due to its convenience and non-invasive nature. When combined with patient-related information, it enables medical professionals to detect potential health risks. However, this process requires significant medical expertise and human resources. While Deep Learning (DL) methods managed to alleviate this problem, they struggle to integrate various patient information with ECG data and fail to provide medical experts with adept clinical suggestions and insights for accurate diagnosis. Although recent advancements in multi-modal large language modeling have propelled their application scope beyond the natural language processing domain, their applicability to electrocardiogram processing still remains largely unexplored, partly due to the lack of ECG-text pairs.To this end, we develop ECG-LM, the first multi-modal Large Language Model (LLM) able to process natural language and understand ECG signals, by aligning text and ECG.Additionally, we address the constraints posed by the lack of ECG-text pairs by generating additional data leveraging authoritative medical guidelines.The results on various downstream tasks show that ECG-LM adeptly captures the intricate features of an ECG. Its zero-shot cardiovascular disease detection performance surpasses existing few-shot solutions, and it also demonstrates excellent potential in ECG-related question answering.Finally, we fine-tune ECG-LM with public clinical conversation datasets and build an additional Supervised Fine-Tuning (SFT) dataset based on real clinical data from the hospital, aiming to provide a more comprehensive and customized user experience.