切换至 "中华医学电子期刊资源库"

中华诊断学电子杂志 ›› 2025, Vol. 13 ›› Issue (04) : 248 -254. doi: 10.3877/cma.j.issn.2095-655X.2025.04.006

智能医学

基于CT生境影像组学的机器学习模型在肺腺癌ALK基因融合预测中的应用价值
丁佳, 季妍廷, 胡翼江()   
  1. 215600 张家港市第一人民医院影像科
  • 收稿日期:2025-08-15 出版日期:2025-11-26
  • 通信作者: 胡翼江

The application value of machine learning models based on CT habitat radiomics in the prediction of ALK gene fusion in lung adenocarcinoma

Jia Ding, Yanting Ji, Yijiang Hu()   

  1. Department of Imaging, the First People′s Hospital of Zhangjiagang, Zhangjiagang 215600, China
  • Received:2025-08-15 Published:2025-11-26
  • Corresponding author: Yijiang Hu
引用本文:

丁佳, 季妍廷, 胡翼江. 基于CT生境影像组学的机器学习模型在肺腺癌ALK基因融合预测中的应用价值[J/OL]. 中华诊断学电子杂志, 2025, 13(04): 248-254.

Jia Ding, Yanting Ji, Yijiang Hu. The application value of machine learning models based on CT habitat radiomics in the prediction of ALK gene fusion in lung adenocarcinoma[J/OL]. Chinese Journal of Diagnostics(Electronic Edition), 2025, 13(04): 248-254.

目的

探讨基于CT生境影像组学构建的机器学习模型在术前无创预测肺腺癌间变性淋巴瘤激酶(ALK)基因融合表达中的价值。

方法

回顾性纳入2015年3月至2023年11月于张家港市第一人民医院影像科完成ALK基因检测及术前胸部CT检查的130例肺腺癌患者(ALK阳性45例,阴性85例)。按7∶3随机划分为训练集(n=90)与测试集(n=40)。采用K-means聚类将病灶划分为两个生境亚区(Habitat 1和Habitat 2),提取并筛选14个关键生境影像组学特征,随后分别利用自编码器(AE)、遗传规划(GP)、线性判别分析(LDA)、逻辑回归(LR)、Lasso逻辑回归(LRLasso)和支持向量机(SVM)6种算法构建模型;采用受试者操作特征(ROC)曲线评估模型效能,并以德龙检验比较AUC差异。

结果

基于LR和LRLasso算法构建的机器学习模型训练集AUC分别为0.862(0.788~0.935)和0.854(0.778~0.930),测试集AUC分别为0.830(0.678~0.930)和0.802(0.646~0.911)。LR模型与LRLasso模型和AE模型的AUC差异无统计学意义(P=0.182,0.104),与剩余模型AUC差异有统计学差异(均P<0.05)。测试集中,LR模型的敏感度和特异度为71.4%和96.2%,LRLasso模型的敏感度和特异度为64.3%和88.5%。

结论

基于CT影像生境影像组学模型在肺腺癌ALK基因融合具有一定的预测能力并显示潜在应用价值,基于LR的机器学习模型具有较好的泛化能力与临床应用潜力,有望作为无创预测肺腺癌ALK基因融合的新型影像工具。

Objective

To explore the value of machine learning models constructed based on CT habitat radiomics in non-invasive preoperative prediction of anaplastic lymphoma kinase (ALK) gene fusion expression in lung adenocarcinoma.

Methods

A total of 130 patients with lung adenocarcinoma who completed ALK gene testing and preoperative chest CT examination in the Imaging Department of the First People′s Hospital of Zhangjiagang from March 2015 to November 2023 were retrospectively included (45 cases were ALK positive and 85 cases were ALK negative). They were randomly divided into the training set (n=90) and the test set (n=40) in a 7∶3 ratio. The lesions were divided into two habitat subregions (Habitat 1 and Habitat 2) by K-means clustering, and 14 key habitat radiomics features were extracted and screened. Subsequently, models were constructed respectively using 6 algorithms: autoencoder (AE), genetic programming (GP), linear discriminant analysis (LDA), logistic regression (LR), Lasso logistic regression (LRLasso), and support vector machine (SVM). The receiver operator characteristic (ROC) curve was used to evaluate the model efficacy, and the DeLong test was used to compare the differences in area under the curve (AUC).

Results

The AUCs of the machine learning model training set constructed based on the LR and LRLasso algorithms were 0.862 (0.788-0.935) and 0.854 (0.778-0.930), respectively, and the AUCs of the test set were 0.830 (0.678-0.930) and 0.802 (0.646-0.911), respectively. There were no statistically significant differences in the AUC between the LR model and the LRLasso model or the AE model (P=0.182, 0.104), but there were statistically significant differences in the AUC between the LR model and the remaining models (all P<0.05). In the test set, the sensitivity and specificity of the LR model were 71.4% and 96.2%, respectively, while those of the LRLasso model were 64.3% and 88.5%, respectively.

Conclusions

The CT image-based habitat radiomics shows a certain predictive capability and potential clinical utility for identifying ALK gene fusion in lung adenocarcinoma. The machine learning model based on LR has a good generalization ability and a potential clinical applicability, and may be used as a new non-invasive imaging tool for predicting ALK gene fusion in lung adenocarcinoma.

图1 研究流程图注:第一步,收集图像;第二步,对图像进行预处理并手动勾画感兴趣区域;第三步,进行亚区域划分和特征提取;第四步,构建不同机器学习模型并在测试集中进行验证。ROI为感兴趣区域;LR、LRLasso、SVM、LDA、GP和AE分别代表6种不同的机器学习模型
表1 LUAD患者训练集和测试集一般临床资料比较
表2 基于6种不同算法构建的机器学习模型对LUAD患者ALK基因融合的诊断效能比较
图2 机器学习模型在预测LUAD患者ALK基因融合测试集中的受试者操作特征曲线注:LR、LRLasso、SVM、LDA、GP和AE分别代表6种不同的机器学习模型。曲线越靠近左上角模型性能越好,曲线下面积用来量化模型整体性能,面积越大分类能力越强。LUAD为肺腺癌;ALK为间变性淋巴瘤激酶;LR为逻辑回归;AE为自编码器;GP为遗传规划;LDA为线性判别分析;LRLasso为Lasso逻辑回归;SVM为支持向量机。P值代表各模型与LR模型之间的统计学差异
图3 预测LUAD患者ALK基因融合状态测试集中6种不同算法模型的列线图注:图中左侧分别列出6个预测模型的标准化预测概率;汇总各模型得分得到总分,总分再对应预测值。LUAD为肺腺癌;ALK为间变性淋巴瘤激酶;LR为逻辑回归;AE为自编码器;GP为遗传规划;LDA为线性判别分析;LRLasso为Lasso逻辑回归;SVM为支持向量机
图4 测试集中6种不同算法模型预测LUAD患者ALK基因融合状态的校准曲线注:图中虚线代表理想状况下预测概率与实际概率一直相等。Brier分数是一种评估预测模型准确性的指标,用于衡量预测结果与实际结果之间的差异,值越低表示校准度越好(0为完美校准)。LUAD为肺腺癌;ALK为间变性淋巴瘤激酶;LR为逻辑回归;AE为自编码器;GP为遗传规划;LDA为线性判别分析;LRLasso为Lasso逻辑回归;SVM为支持向量机
图5 测试集中6种不同算法模型预测LUAD患者ALK基因融合状态的决策曲线分析注:Treat all代表所有样本都被视为正类的情景,Treat none代表所有样本都被视为负类的情景。若模型的净收益曲线高于参考线(Treat none和Treat all),说明在该阈值范围内使用模型具有临床价值。曲线越高,净收益越大。LUAD为肺腺癌;ALK为间变性淋巴瘤激酶;LR为逻辑回归;AE为自编码器;GP为遗传规划;LDA为线性判别分析;LRLasso为Lasso逻辑回归;SVM为支持向量机
[1]
Hendriks LRemon JFaivre-Finn C,et al.Non-small-cell lung cancer[J].Nat Rev Dis Primers202410(1):71.DOI:10.1038/s41572-024-00551-9.
[2]
Denisenko TVBudkevich INZhivotovsky B.Cell death-based treatment of lung adenocarcinoma[J].Cell Death Dis20189(2):117.DOI:10.1038/s41419-017-0063-y.
[3]
Herbst RSMorgensztern DBoshoff C.The biology and management of non-small cell lung cancer[J].Nature2018553(7689):446-454.DOI:10.1038/nature25183.
[4]
Barlesi FMazieres JMerlio JP,et al.Routine molecular profiling of patients with advanced non-small-cell lung cancer:results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup (IFCT)[J].Lancet2016387(10026):1415-1426.DOI:10.1016/S0140-6736(16)00004-0.
[5]
Han BZheng RZeng H,et al.Cancer incidence and mortality in China,2022[J].J Natl Cancer Cent20244(1):47-53.DOI:10.1016/j.jncc.2024.01.006.
[6]
中国抗癌协会肿瘤病理专业委员会,中华医学会肿瘤学分会肺癌专家委员会,国家病理质控中心.非小细胞肺癌融合基因检测临床实践中国专家共识(2023版)[J].中华病理学杂志202352(6):565-573.DOI:10.3760/cma.j.cn112151-20221111-00946.
[7]
Batra UNathany SSharma M,et al.IHC versus FISH versus NGS to detect ALK gene rearrangement in NSCLC:all questions answered?[J].J Clin Pathol202275(6):405-409.DOI:10.1136/jclinpath-2021-207408.
[8]
Kuang YXu PWang J,et al.Detecting ALK rearrangement with RT-PCR:a reliable approach compared with next-generation sequencing in patients with NSCLC[J].Mol Diagn Ther202125(4):487-494.DOI:10.1007/s40291-021-00532-8.
[9]
白日兰,崔久嵬.肿瘤异质性-精准临床诊治的挑战[J].中国肿瘤临床202047(21):1081-1087.DOI:10.3969/j.issn.1000-8179.2020.21.341.
[10]
Lambin PRios-Velazquez ELeijenaar R,et al.Radiomics:extracting more information from medical images using advanced feature analysis[J].Eur J Cancer201248(4):441-446.DOI:10.1016/j.ejca.2011.11.036.
[11]
Le N, Kha QH, Nguyen VH, et al. Machine learning-based radiomics signatures for EGFR and KRAS mutations prediction in non-small-cell lung cancer[J].Int J Mol Sci202122(17):9254.DOI:10.3390/ijms22179254.
[12]
Rossi GBarabino EFedeli A,et al.Radiomic detection of EGFR mutations in NSCLC[J].Cancer Res202181(3):724-731.DOI:10.1158/0008-5472.Can-20-0999.
[13]
Porto-álvarez JCernadas EAldaz Martínez R,et al. CT-based radiomics to predict KRAS mutation in CRC patients using a machine learning algorithm:a retrospective study[J].Biomedicines202311(8):2144.DOI:10.3390/biomedicines11082144.
[14]
Su GHXiao YYou C,et al.Radiogenomic-based multiomic analysis reveals imaging intratumor heterogeneity phenotypes and therapeutic targets[J].Sci Adv20239(40):eadf0837.DOI:10.1126/sciadv.adf0837.
[15]
Yu SYang YWang Z,et al.CT-based conventional radiomics and quantification of intratumoral heterogeneity for predicting benign and malignant renal lesions[J].Cancer Imaging202424(1):130.DOI:10.1186/s40644-024-00775-8.
[16]
Liu YWang PWang S,et al.Heterogeneity matching and IDH prediction in adult-type diffuse gliomas:a DKI-based habitat analysis[J].Front Oncol2023(13):1202170.DOI:10.3389/fonc.2023.1202170.
[17]
Zwanenburg AVallières MAbdalah MA,et al.The image biomarker standardization initiative:standardized quantitative radiomics for high-throughput Image-based phenotyping[J].Radiology2020295(2):328-338.DOI:10.1148/radiol.2020191145.
[18]
Song YZhang JZhang YD,et al.FeAture Explorer (FAE):a tool for developing and comparing radiomics models[J].PLoS One202015(8):e0237587.DOI:10.1371/journal.pone.0237587.
[19]
张双,樊清语,田雅乐,等.肝脏乏脂肪型血管周上皮样细胞肿瘤的影像学特征[J/CD].中华诊断学电子杂志202513(2):97-102.DOI:10.3877/cma.j.issn.2095-655X.2025.02.005.
[20]
Song LZhu ZMao L,et al. Clinical,conventional CT and radiomic feature-based machine learning models for predicting ALK rearrangement status in lung adenocarcinoma patients[J].Front Oncol2020(10):369.DOI:10.3389/fonc.2020.00369.
[21]
Hao PDeng BYHuang CT,et al.Predicting anaplastic lymphoma kinase rearrangement status in patients with non-small cell lung cancer using a machine learning algorithm that combines clinical features and CT images[J].Front Oncol2022(12):994285.DOI:10.3389/fonc.2022.994285.
[22]
Roerden MSpranger S.Cancer immune evasion,immunoediting and intratumour heterogeneity[J].Nat Rev Immunol202525(5):353-369.DOI:10.1038/s41577-024-01111-8.
[23]
杨泽亭,吴慧,高鸿雁,等.生境成像在多系统肿瘤的应用进展[J].磁共振成像202516(3):222-227.DOI:10.12015/issn.1674-8034.2025.03.038.
[24]
Gillies RJBalagurunathan Y.Perfusion MR imaging of breast cancer:insights using "habitat imaging" [J].Radiology2018288(1):36-37.DOI:10.1148/radiol.2018180271.
[1] 戴辉水, 吕嵩, 张劲松, 巴根, 石齐芳. 基于机器学习算法构建药物中毒患者ICU住院时间延长的预测模型[J/OL]. 中华危重症医学杂志(电子版), 2025, 18(04): 274-281.
[2] 钱何布, 朱林, 姚月平, 姚峰, 李玉卓, 马家驹, 晏倩, 倪晓艳. 应用机器学习建立脓毒性休克患者住院28天死亡预测模型及验证[J/OL]. 中华实验和临床感染病杂志(电子版), 2025, 19(05): 288-297.
[3] 杨雯林, 吴元魁. 影像组学在胰腺神经内分泌瘤诊疗中的研究进展[J/OL]. 中华普通外科学文献(电子版), 2025, 19(06): 426-432.
[4] 毛俊, 蔡兆伦, 尹晓南, 沈朝勇, 张波. 影像组学预测模型在胃肠间质瘤诊断及预后中的研究进展[J/OL]. 中华普通外科学文献(电子版), 2025, 19(06): 421-425.
[5] 徐蓓, 厉小梅, 王俐, 李雨薇, 徐晓玲, 陈卓. 原发性干燥综合征伴肺结节的临床特征[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(05): 673-678.
[6] 汪锐, 陈自武, 杨朴强, 田静, 陈莹, 林成, 汪伟. 基于血清标志物机器学习模型对慢性阻塞性肺疾病急性加重期机械通气风险的预测分析[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(04): 615-619.
[7] 沈月秋, 曹梦琳, 许梅杰, 吴敏丹, 吴凯怡, 陆琳娟. 基于可解释机器学习预测慢性阻塞性肺疾病患者急性加重风险研究[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(03): 380-384.
[8] 许丽妹, 吴海燕, 林海丽, 吉顺妮, 陈春汝, 符娇娇, 陈忠仁, 吴小妹. 机械通气患者呼吸机相关性肺炎的风险分析[J/OL]. 中华肺部疾病杂志(电子版), 2025, 18(03): 447-451.
[9] 黄少坚, 梁汉标, 李清平, 唐善华, 李青妍, 李芷西, 黄灿, 王小振, 陈灿辉, 王恺, 李川江. 基于影像组学和临床特征构建肝癌新辅助/转化治疗后病理学完全缓解预测模型[J/OL]. 中华肝脏外科手术学电子杂志, 2025, 14(06): 860-867.
[10] 唐善华, 赖展鸿, 刘海晴, 王小振, 王恺, 周杰. 基于XGBoost算法构建肝癌肝切除术后肝衰竭早期识别预测模型[J/OL]. 中华肝脏外科手术学电子杂志, 2025, 14(05): 725-731.
[11] 张娴, 王彬瞻, 王馨媛, 罗再, 王庆国, 程云章, 黄陈. 基于增强CT的二维、三维影像组学和联合模型对术前预测结直肠癌脉管侵犯价值研究[J/OL]. 中华结直肠疾病电子杂志, 2025, 14(05): 457-467.
[12] 王柯云, 孙雅佳, 李甜, 张钰哲, 郑颖, 张伟光, 王倩, 董哲毅. 糖尿病肾脏疾病早期发生风险预测模型的研究进展[J/OL]. 中华肾病研究电子杂志, 2025, 14(04): 218-225.
[13] 李媛媛, 李荣山. 机器学习:肾脏疾病研究与诊疗的新前沿[J/OL]. 中华肾病研究电子杂志, 2025, 14(04): 181-187.
[14] 胡斌, 柳林. 食管癌新辅助治疗后CT影像特征变化与术后病理完全缓解率的相关性研究[J/OL]. 中华消化病与影像杂志(电子版), 2025, 15(06): 627-634.
[15] 李曰平, 鞠倩, 张汝梦, 韩博. 基于CT影像组学预测胃癌根治术复发风险的临床研究[J/OL]. 中华消化病与影像杂志(电子版), 2025, 15(05): 460-466.
阅读次数
全文


摘要


AI


AI小编
你好!我是《中华医学电子期刊资源库》AI小编,有什么可以帮您的吗?