Development of a novel rapid and high-precision active learning algorithm: A case study of the prediction of the mechanical properties of MAX phase crystals
-
摘要: 近年來,MAX相晶體由于獨特的納米層狀的晶體結構具有自潤滑、高韌性、導電性等優點,成為全球的研究熱點之一. 其中M2AX相晶體兼具陶瓷和金屬化合物的性能,同時具有抗熱震性、高韌性、導電性和導熱性,但是由于該類材料的單相樣品實驗制備比較困難,從而限制了其發展. 主動學習是一種利用少量標記樣本可以達到較好預測性能的機器學習方法,本文將高效全局優化算法與殘差主動學習回歸算法相結合,提出了一種改良的主動學習選擇策略RS-EGO,基于169個M2AX相晶體的數據集,對M2AX相晶體的體模量、楊氏模量與剪切模量進行建模與預測尋優,通過計算模擬的方式來探索材料性能從而減少無效的驗證實驗. 結果發現, RS-EGO在快速尋找最優值的同時具有較好的預測能力,綜合性能要優于兩種原始選擇策略,也更適合樣本量較少的材料性能預測問題,同時選擇不同的結合參數會影響改良算法的優化方向. 通過在兩個公開數據集上運用改良算法證明了其有效性,并給出了結合參數的選擇,設計不同結合參數下的模型實驗,進一步探究不同參數對模型優化方向的影響.Abstract: In recent years, MAX phase crystals have emerged as a prominent area of global research due to their unique nanolayered crystal structure, which provides advantages such as self-lubrication, high tenacity, and electrical conductivity. M2AX phase crystals have properties associated with both ceramic and metal compounds, such as thermal shock resistance, high tenacity, electrical conductivity, and thermal conductivity. However, research on these materials is challenging due to the difficulty in preparing single-phase samples for such materials. Active learning is a machine learning method that uses a small number of labeled samples to achieve high prediction performance. This paper proposes an improved active learning selection strategy, called RS-EGO, based on the combination of efficient global optimization and residual active learning regression according to their characteristics after analyzing the sampling strategies of active learning and efficient global optimization algorithms. The proposed strategy is applied to predict and determine the optimal values of the bulk modulus, Young’s modulus, and shear modulus based on a dataset of 169 M2AX phase crystals. This analysis is conducted using computational simulations to explore the material properties, reducing the need for ineffective validation experiments. The results showed that RS-EGO has good prediction ability and can rapidly find the optimal value. Its comprehensive performance is not only better than the two original selection strategies but is also more suitable for material property prediction problems with limited sample data. The choice of various parameter combinations can influence the direction of optimization of this improved algorithm. RS-EGO was applied to two publicly available datasets (one with a sample size of 103 and the other with a sample size of 1836), and both analyses achieved smaller root mean square errors, smaller opportunity costs, and larger decidable coefficient values, which demonstrates the effectiveness of the algorithm for both small and large sample datasets. A range of parameter combinations broader than previous experiments is explored, with experiments designed to explore the regularity of the contribution of different parameters to different optimization directions of the model. The results show that larger parameter values cause the algorithm to behave more like the efficient global optimization algorithm with a better ability to find the optimal value. Conversely, the closer the model is to the residual active learning regression algorithm with a better accuracy prediction performance, the better will be its prediction performance. Thus, the focus of the two capabilities can be adjusted by choosing the combination of parameters appropriately.
-
圖 7 基于Concrete-CS (a~c)和Indirect (d~f)兩個數據集ALR的采樣結果(以響應變量的最小值為目標). (a, d) RMSE值; (b, e) R2值; (c, f)機會成本值
Figure 7. Active learning regression sampling results (aiming for the minimum value of the response variable) based on Concrete-CS (a–c) and Indirect (d–f) datasets: (a, d) RMSE value; (b, e) R2 value; (c, f) opportunity cost
表 1 M2AX相數據集的描述性統計
Table 1. Descriptive statistics of the M2AX phase data set
Features name Feature description Minimum Maximum Average Standard deviation Ms M-atom s-orbital radii 1.360 1.593 1.492 0.075 Mp M-atom p-orbital radii 0.416 0.617 0.541 0.074 Md M-atom d-orbital radii 0.427 0.829 0.656 0.152 As A-atom s-orbital radii 0.445 1.093 0.903 0.171 Ap A-atom p-orbital radii 0.808 1.382 1.150 0.167 Xs X-atom s-orbital radii 0.521 0.620 0.571 0.050 Xp X-atom p-orbital radii 0.488 0.596 0.542 0.054 TB_Den Total bond order density 0.019 0.045 0.031 0.007 M_M_BO M-M bond order 0 3.643 1.541 0.699 M_A_BO M-A bond order 2.902 8.129 4.960 0.932 M_X_BO M-X bond order 5.382 9.454 7.337 1.135 A_A_BO A-A bond order 0 1.769 0.611 0.564 M_Q* M-atom charge transfer –1.007 –0.410 –0.731 0.134 A_Q* A-atom charge transfer 0.099 0.958 0.579 0.175 X_Q* X-atom charge transfer 0.678 1.176 0.883 0.106 N_E(0) Fermi-level 0.668 10.506 4.132 1.951 K Bulk modulus 79.171 263.124 165.026 40.604 G Shear modulus 10.963 151.003 88.989 25.938 E Young’s modulus 31.591 376.709 224.468 62.064 表 2 特征重要性排序(需比較的特征組已加粗)
Table 2. Feature importance ranking (Feature groups to be compared are in bold)
Features K_Fscore G_Fscore E_Fscore Ms 10 10 9 Mp 1 1 1 Md 15 15 15 As 8 7 8 Ap 14 13 13 Xs 13 14 14 Xp 15 15 15 TB_Den 5 5 2 M_M_BO 2 3 5 M_A_BO 4 6 3 M_X_BO 3 2 5 A_A_BO 6 8 7 M_Q* 11 12 10 A_Q* 12 9 11 X_Q* 9 11 11 N_E(0) 7 4 4 表 3 模型預測的平均AUC結果,以粗體顯示最大最小值
Table 3. Average AUC results for model prediction with the maximum and minimum in bold
Evaluation indicator EGO RSAL RS-EGO (2∶1) RS-EGO (1∶1) RS-EGO (1∶2) AUC-RMSE 1056.3973 1038.6413 1041.9258 1053.9733 1053.3511 AUC-R2 45.8378 47.8767 47.0105 45.9895 45.9657 表 4 不同目標的平均AUC值排序
Table 4. Ranking of the average AUC values of different targets
AUC values Target EGO RSAL RS-EGO (2∶1) RS-EGO (1∶1) RS-EGO (1∶2) AUC-RMSE max_K 5 3 2 1 4 min_K 5 1 2 4 3 max_G 5 1 2 3 4 min_G 4 5 3 1 2 max_E 4 5 1 2 3 min_E 1 5 4 3 2 RSR 0.800 0.667 0.467 0.467 0.600 AUC-R2 max_K 5 3 2 1 4 min_K 5 1 2 3 4 max_G 5 1 2 3 4 min_G 4 5 3 1 2 max_E 5 2 1 3 4 min_E 1 5 4 3 2 RSR 0.833 0.567 0.467 0.467 0.667 AUC-Oppo max_K 2 5 4 3 1 min_K 1 5 4 3 2 max_G 5 2 1 3 4 min_G 4 1 5 2 3 max_E 5 1 2 3 4 min_E 2 1 5 3 4 RSR 0.633 0.5 0.7 0.567 0.6 表 5 數據集的基本信息
Table 5. Basic information about the dataset
Dataset Source Sample size Original feature quantity Final feature quantity Concrete-CS UCI 103 7 7 Indirect Journal 1836 15 15 表 6 模型預測的平均AUC結果
Table 6. Average AUC results for model prediction and optimization
Dataset Evaluation indicator EGO RSAL RS-EGO (2∶1) RS-EGO (1∶1) RS-EGO (1∶2) Concrete-CS AUC-RMSE 231.1336 218.2656 220.8041 230.6000 231.8836 AUC-R2 54.1849 55.4580 55.2139 54.2406 54.1076 AUC-OPPO 0.03759 0.06281 0.05979 0.03805 0.03674 Indirect AUC-RMSE 8.2622 8.2630 7.8614 8.0573 8.2748 AUC-R2 58.5080 58.5443 59.2744 58.9135 58.4946 AUC-OPPO 0.03721 0.06605 0.05804 0.03865 0.03846 表 7 模型預測與尋優的平均AUC結果
Table 7. Average AUC results for model prediction and optimization
Dataset Evaluation indicator RS-EGO (3∶1) RS-EGO (2∶1) RS-EGO (1∶1) RS-EGO (1∶2) RS-EGO (1∶3) Concrete-CS AUC-RMSE 218.9119 220.8041 230.6000 231.8836 231.8504 AUC-R2 55.4126 55.2139 54.2406 54.1076 54.1073 AUC-OPPO 0.05683 0.05979 0.03805 0.03674 0.03683 Indirect AUC-RMSE 7.8482 7.8614 8.0573 8.2748 8.3295 AUC-R2 59.2972 59.2744 58.9135 58.4946 58.3809 AUC-OPPO 0.05884 0.05804 0.03865 0.03846 0.03799 www.77susu.com -
參考文獻
[1] Zeng X Q, Xie T, Ying T, et al. Data-driven designing of microstructures and properties of magnesium alloys. Mater China, 2020, 39(1): 1 doi: 10.7502/j.issn.1674-3962.201911007曾小勤, 謝天, 應韜, 等. 數據驅動的鎂合金結構與性能設計. 中國材料進展, 2020, 39(1):1 doi: 10.7502/j.issn.1674-3962.201911007 [2] Liu X J, Chen Y C, Lu Y, et al. Present research situation and prospect of multi-scale design in novel co-based superalloys: A review. Acta Metall Sin, 2020, 56(1): 1 doi: 10.11900/0412.1961.2019.00159劉興軍, 陳悅超, 盧勇, 等. 新型鈷基高溫合金多尺度設計的研究現狀與展望. 金屬學報, 2020, 56(1):1 doi: 10.11900/0412.1961.2019.00159 [3] Krishna Y V, Jaiswal U K, Rahul M R. Machine learning approach to predict new multiphase high entropy alloys. Scr Mater, 2021, 197: 113804 doi: 10.1016/j.scriptamat.2021.113804 [4] Qiao L, Ramanujan R V, Zhu J C. Machine learning accelerated design of a family of AlxCrFeNi medium entropy alloys with superior high temperature mechanical and oxidation properties. Corros Sci, 2023, 211: 110805 doi: 10.1016/j.corsci.2022.110805 [5] Rao Z Y, Tung P Y, Xie R W, et al. Machine learning-enabled high-entropy alloy discovery. Science, 2022, 378(6615): 78 doi: 10.1126/science.abo4940 [6] Qin Z J, Wang Z, Wang Y Q, et al. Phase prediction of Ni-base superalloys via high-throughput experiments and machine learning. Mater Res Lett, 2021, 9(1): 32 doi: 10.1080/21663831.2020.1815093 [7] Zeng Y Z, Man M R, Bai K W, et al. Explore the full temperature-composition space of 20 quinary CCAs for FCC and BCC single-phases by an iterative machine learning + CALPHAD method. Acta Mater, 2022, 231: 117865 doi: 10.1016/j.actamat.2022.117865 [8] Su Y J, Fu H D, Bai Y, et al. Progress in materials genome engineering in China. Acta Metall Sin, 2020, 56(10): 1313 doi: 10.11900/0412.1961.2020.00199宿彥京, 付華棟, 白洋, 等. 中國材料基因工程研究進展. 金屬學報, 2020, 56(10):1313 doi: 10.11900/0412.1961.2020.00199 [9] Chen Z Y, Jing F W, Li J, et al. Recognition algorithm of hot-rolled strip steel water beam mark based on a semisupervised learning model of an improved denoising autoencoder. Chin J Eng, 2022, 44(8): 1338陳兆宇, 荊豐偉, 李杰, 等. 基于改進降噪自編碼器半監督學習模型的熱軋帶鋼水梁印識別算法. 工程科學學報, 2022, 44(8):1338 [10] Lookman T, Balachandran P V, Xue D Z, et al. Statistical inference and adaptive design for materials discovery. Curr Opin Solid State Mater Sci, 2017, 21(3): 121 doi: 10.1016/j.cossms.2016.10.002 [11] Li Z X, Zhang N, Xiong B, et al. Materials science database in material research and development: Recent applications and prospects. Front Data Comput, 2020, 2(2): 78李姿昕, 張能, 熊斌, 等. 材料科學數據庫在材料研發中的應用與展望. 數據與計算發展前沿, 2020, 2(2):78 [12] Zhao W C, Zheng C, Xiao B, et al. Composition refinement of 6061 aluminum alloy using active machine learning model based on Bayesian optimization sampling. Acta Metall Sin, 2021, 57(6): 797趙婉辰, 鄭晨, 肖斌, 等. 基于Bayesian采樣主動機器學習模型的6061鋁合金成分精細優化. 金屬學報, 2021, 57(6):797 [13] Bassman Oftelie L, Rajak P, Kalia R K, et al. Active learning for accelerated design of layered materials. NPJ Comput Mater, 2018, 4(1): 74 doi: 10.1038/s41524-018-0129-0 [14] Gubaev K, Podryabinkin E V, Hart G L W, et al. Accelerating high-throughput searches for new alloys with active learning of interatomic potentials. Comput Mater Sci, 2019, 156: 148 doi: 10.1016/j.commatsci.2018.09.031 [15] Jones D R, Schonlau M, Welch W J. Efficient global optimization of expensive black-box functions. J Glob Optim, 1998, 13(4): 455 doi: 10.1023/A:1008306431147 [16] Douak F, Melgani F, Benoudjit N. Kernel ridge regression with active learning for wind speed prediction. Appl Energy, 2013, 103: 328 doi: 10.1016/j.apenergy.2012.09.055 [17] Barsoum M W. MAX Phases : Properties of Machinable Ternary Carbides and Nitrides. Hoboken: John Wiley & Sons, 2013 [18] Aryal S, Sakidja R, Barsoum M W, et al. A genomic approach to the stability, elastic, and electronic properties of the MAX phases. Phys Status Solidi B, 2014, 251(8): 1480 doi: 10.1002/pssb.201451226 [19] Cover M F, Warschkow O, Bilek M M, et al. A comprehensive survey of M2AX phase elastic properties. J Phys: Condens Matter, 2009, 21(30): 305403 doi: 10.1088/0953-8984/21/30/305403 [20] Balachandran P V, Xue D Z, Theiler J, et al. Adaptive strategies for materials design using uncertainties. Sci Rep, 2016, 6: 19660 doi: 10.1038/srep19660 [21] RayChaudhuri T, Hamey L G C. Minimisation of data collection by active learning//Proceedings of ICNN'95-International Conference on Neural Networks. Perth, 1995: 1338 [22] Cai W B, Zhang Y, Zhou J. Maximizing expected model change for active learning in regression//2013 IEEE 13th International Conference on Data Mining. Dallas, 2013: 51 [23] Sugiyama M, Nakajima S. Pool-based active learning in approximate linear regression. Mach Learn, 2009, 75(3): 249 doi: 10.1007/s10994-009-5100-3 [24] Wu D R, Lin C T, Huang J. Active learning for regression using greedy sampling. Inf Sci, 2019, 474: 90 doi: 10.1016/j.ins.2018.09.060 [25] Powell W B, Ryzhov I O. Optimal Learning. Hoboken: John Wiley & Sons, 2012 -