A two-step method for cusp catastrophe model construction based on the selection of important variables
-
摘要: 突變是工程實踐過程中廣泛存在的現象。當系統的狀態發生跳躍性變化時,基于微積分的傳統數學建模方法精度較低,人工神經網絡等機器學習算法無法對突變現象作出合理的解釋。基于突變理論的尖點突變模型可以用來解釋系統狀態的不連續變化,然而在輸入變量維度較大的情況下,傳統的尖點突變模型復雜度高且精度較差。為了解決這一問題,提出了一種基于變量選擇的尖點突變模型的兩步構建方法。第一步,利用多模型集成重要變量選擇算法(MEIVS)量化待選變量的重要性并提取重要變量;第二步,基于極大似然法(MLE)利用所提取的重要變量構建尖點突變模型。仿真結果表明,在具有突變特征的數據集上,通過MEIVS降維后的尖點突變模型在評價指標上優于線性模型、Logistic模型和通過其他方法降維的尖點突變模型,并且可以用來解釋研究對象的不連續變化。Abstract: Sudden transition is a widely existing phenomenon in engineering practice. When the state of the system experiences sudden abrupt transition, calculus-based traditional mathematical modeling methods has low accuracy. Although theoretically, machine learning algorithms, such as artificial neural networks, can approximate any nonlinear function, this type of black-box method makes no reasonable explanation for the sudden transition phenomenon. The cusp catastrophe model based on the catastrophe theory can be applied to explain the discontinuous changes in the system’s state. However, the construction of traditional cusp catastrophe models is often based on large amounts of prior knowledge to select the input variables for modeling. On the condition that there is a lack of prior knowledge and comparatively large dimensions of input variables, the model has high complexity and poor accuracy. In this paper we have put forward a two-step method for constructing a cusp catastrophe model based on the selection of variables to solve the abovementioned problems. The first step was to apply multimodel ensemble important variable selection (MEIVS) to quantify the importance of the variables to be selected and extract important variables. The second step was to use the extracted important variables to construct a cusp catastrophe model based on the framework of maximum likelihood estimation (MLE). Results indicate that on a dataset with characteristics of catastrophe, the cusp catastrophe model is simple in form using the MEIVS dimensionality reduction algorithm and outperforms the unreduced cusp catastrophe model and reduced cusp catastrophe model using other dimensionality reduction algorithms in terms of evaluation indicators. This shows that the algorithm proposed in this paper have improved the accuracy and reduced the complexity of the cusp catastrophe model. At the same time, the cusp catastrophe model exhibits higher accuracy compared with the linear and logistic models. Thus, it can be used to explain the discontinuous changes of the research object, and it has a practical engineering significance.
-
Key words:
- catastrophe theory /
- catastrophe flag /
- cusp catastrophe model /
- variable selection /
- model integration
-
表 1 歐洲旅館住宿價格數據集建模結果評價
Table 1. Evaluation of the modeling results of the European hotel accommodation price dataset
Model Number of parameters R2 AIC BIC Linear 0.549 1306 1323 Logistic 0.626 1294 1324 Cusp (based on the two-step method) 12 0.727 195 228 Cusp (based on the traditional method) 16 0.697 190 235 Cusp (based on SCC) 10 0.572 204 232 Cusp (based on MIC) 6 0.421 234 251 Cusp (based on RFVIM) 12 0.565 210 243 表 2 北京大氣腐蝕數據集建模結果評價
Table 2. Evaluation of the modeling results of the Beijing atmospheric corrosion dataset
Model Number of parameters R2 AIC BIC Linear 0.668 2180 2203 Logistic 0.755 1970 2011 Cusp (based on the two-step method) 10 0.778 670 716 Cusp (based on the traditional method) 20 0.775 672 764 Cusp (based on SCC) 10 0.719 816 862 Cusp (based on MIC) 8 0.725 820 857 Cusp (based on RFVIM) 18 0.765 673 755 www.77susu.com -
參考文獻
[1] Qiao C, Guo Y H, Li C H. Study on rock burst prediction of deep buried tunnel based on cusp catastrophe theory. Geotech Geol Eng, 2021, 39(2): 1101 doi: 10.1007/s10706-020-01547-4 [2] Zhi Y J, Yang T, Fu D M. An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels. J Mater Sci Technol, 2020, 49: 202 doi: 10.1016/j.jmst.2020.01.044 [3] Pei J K, Wang F Y, Guo H H, et al. Cause analysis of chemical accidents based on improved cusp catastrophe model. China Saf Sci J, 2019, 29(7): 20 doi: 10.16265/j.cnki.issn1003-3033.2019.07.004裴甲坤, 王飛躍, 郭換換, 等. 基于改進尖點突變模型的化工事故致因分析. 中國安全科學學報, 2019, 29(7):20 doi: 10.16265/j.cnki.issn1003-3033.2019.07.004 [4] Lin L. Stochastic cusp catastrophe model for Chinese stock market. J Syst Eng, 2016, 31(1): 55 doi: 10.13383/j.cnki.jse.2016.01.006林黎. 中國股票市場的隨機尖點突變模型. 系統工程學報, 2016, 31(1):55 doi: 10.13383/j.cnki.jse.2016.01.006 [5] Barunik J, Kukacka J. Realizing stock market crashes: Stochastic cusp catastrophe model of returns under time-varying volatility. Quant Finance, 2015, 15(6): 959 doi: 10.1080/14697688.2014.950319 [6] Ma Y R, Yi D, Hu B. Analysis of stochastic catastrophe mechanism of occupational well-being of nursing practitioner servicing for the elderly. J Syst Manag, 2021, 30(3): 526馬躍如, 易丹, 胡斌. 養老護理員工作幸福感的隨機突變機理. 系統管理學報, 2021, 30(3):526 [7] Eladany M M, Eldesouky A A, Sallam A A. Power system transient stability: An algorithm for assessment and enhancement based on catastrophe theory and FACTS devices. IEEE Access, 2018, 6: 26424 doi: 10.1109/ACCESS.2018.2834906 [8] Xiao X P, Duan H M. A new grey model for traffic flow mechanics. Eng Appl Artif Intell, 2020, 88: 103350 doi: 10.1016/j.engappai.2019.103350 [9] Wei X, Fu D M, Chen M D, et al. Data mining to effect of key alloying elements on corrosion resistance of low alloy steels in Sanya seawater environmentAlloying Elements. J Mater Sci Technol, 2021, 64: 222 doi: 10.1016/j.jmst.2020.01.040 [10] Pei Z B, Zhang D W, Zhi Y J, et al. Towards understanding and prediction of atmospheric corrosion of an Fe/Cu corrosion sensor via machine learning. Corros Sci, 2020, 170: 108697 doi: 10.1016/j.corsci.2020.108697 [11] Thom R. Structural Stability and Morphogenesis: An Outline of a General Theory of Models. London: Benjamin W A, 1975 [12] Cobb L. Stochastic catastrophe models and multimodal distributions. Syst Res, 1978, 23(4): 360 doi: 10.1002/bs.3830230407 [13] Cobb L, Zacks S. Applications of catastrophe theory for statistical modeling in the biosciences. J Am Stat Assoc, 1985, 80(392): 793 doi: 10.1080/01621459.1985.10478184 [14] Zeeman E C. Catastrophe theory. Sci Am, 1976, 234(4): 65 doi: 10.1038/scientificamerican0476-65 [15] Niu D X, Wang K K, Sun L J, et al. Short-term photovoltaic power generation forecasting based on random forest feature selection and CEEMD: A case study. Appl Soft Comput, 2020, 93: 106389 doi: 10.1016/j.asoc.2020.106389 [16] Al-Fugara A, Ahmadlou M, Shatnawi R, et al. Novel hybrid models combining meta-heuristic algorithms with support vector regression (SVR) for groundwater potential mapping. Geocarto Int, 2020: 1 [17] Zhang J L, da Xu, Hao K J, et al. FS-GBDT: Identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT. Brief Bioinform, 2020, 22(3): bbaa189 [18] Tsai C F, Sung Y T. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl Based Syst, 2020, 203: 106097 doi: 10.1016/j.knosys.2020.106097 [19] Bolón-Canedo V, Alonso-Betanzos A. Ensembles for feature selection: A review and future trends. Inf Fusion, 2019, 52: 1 doi: 10.1016/j.inffus.2018.11.008 [20] Pes B. Ensemble feature selection for high-dimensional data: A stability analysis across multiple domains. Neural Comput Appl, 2020, 32(10): 5951 doi: 10.1007/s00521-019-04082-3 [21] Hartelman P A I, Maas H L J, Molenaar P C M. Detecting and modelling developmental transitions. Br J Dev Psychol, 1998, 16(1): 97 doi: 10.1111/j.2044-835X.1998.tb00751.x [22] Grasman R P P P, van der Maas H L J, Wagenmakers E J. Fitting the cusp catastrophe inR: AcuspPackage primer. J Stat Soft, 2009, 32(8): 1 [23] Aaron F, Cynthia R, Francesca D. All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously. J Machine Learning Research, 2019, 20(177): 1 [24] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5 doi: 10.1023/A:1010933404324 [25] Buscema M. Back propagation neural networks. Subst Use Misuse, 1998, 33(2): 233 doi: 10.3109/10826089809115863 [26] Karatzoglou A, Smola A, Hornik K, et al. Kernlab- AnS4Package for kernel methods inR. J Stat Soft, 2004, 11(9): 1 [27] Amar Aladžuz. Hotels accommodation prices dataset [DB/OL]. Kaggle (2020-12-24) [2021-07-16].https://www.kaggle.com/aladzuzamar/hotels-accommodation-prices-dataset [28] Biecek P. DALEX: explainers for complex predictive models in R. J Mach Learn Res, 2018, 19(1): 3245 [29] Pei Z B, Cheng X Q, Yang X J, et al. Understanding environmental impacts on initial atmospheric corrosion based on corrosion monitoring sensors. J Mater Sci Technol, 2021, 64: 214 doi: 10.1016/j.jmst.2020.01.023 -