Comparison of the effect of various clustering algorithms on the furnace profile management
-
摘要: 高爐操作爐型與高爐操作、技術經濟指標等關系密切,合理的操作爐型有利于保證高爐生產的優質、低耗、高產、長壽。通過對冷卻壁溫度的聚類分析,能夠有效合理地表征高爐操作爐型的變化,對高爐生產有著重要的指導意義。分別采用K-Means、TwoStep對數據集進行聚類分析,基于兩種聚類算法的原理,結合Davies?Bouldin index(DBI)與Dunn index(DI)對聚類結果進行評價,分析不同聚類算法間的差異,得出了在所選樣本數據及數據特征基礎上,K-Means算法聚類結果更好的結論,該研究可為高爐煉鐵大數據分析中的聚類算法選擇提供有力參考。Abstract: The blast furnace operation profile is closely related to the operation, technical and economic indicators of a blast furnace. A reasonable furnace operation profile ensures high-quality hot metal, low fuel consumption, high yield, and furnace longevity. To guide the blast furnace ironmaking, cluster analysis of the stave temperature is implemented to effectively characterize the changes in the furnace operation profile. The K-Means, TwoStep, and hierarchical clustering algorithms are often used to monitor the blast furnace operation profile. The present study also shows that various clustering algorithms can help manage the blast furnace operation profile. However, the difference among the clustering results from these algorithms remains unclear. Based on the previous research, this paper compared the clustering principles and research status with various algorithms and selected two algorithms of K-Means and TwoStep, which were more applicable and compatible with the algorithm principles. The K-Means algorithm is a typical partition-based clustering algorithm with low time complexity, high clustering efficiency, and good clustering quality. It has been widely used in cluster analysis of the blast furnace operation profile. Additionally, domestic scholars had given effective improvement measures for the shortcomings of sensitivity to the initial center and requirements for data distribution. The TwoStep algorithm was an improved BIRCH (Balanced iterative reducing and clustering using hierarchies) algorithm, which reduced time complexity and can automatically determine the optimal number of clusters. The authors of this article considered the problem that indicators for evaluating the furnace operation profile were multiple and largely overlapped. Principal Component Analysis was introduced based on the TwoStep algorithm. Three new core indicators were generated from the traditional evaluation indicators for the clustering results of the furnace operation profile. For blast furnace operation profile monitoring and management, three core indicators also showed improved performance. In this paper, K-Means and TwoStep were used to cluster the data set. Based on the principles of these algorithms and combined with the Davies?Bouldin index and Dunn index, the clustering results were analyzed to judge the difference between the two clustering algorithms. The analysis based on the sample data and data characteristics selected in this article revealed that the K-Means algorithm achieved better clustering results than TwoStep. This research can provide a powerful reference for selection among various clustering algorithms in blast furnace ironmaking big data analysis.
-
Key words:
- furnace profile management /
- K-Means /
- TwoStep /
- cluster evaluation index /
- big data
-
表 1 聚類算法分類及特點
Table 1. Classification and characteristics of clustering algorithms
Clustering algorithms Advantages Disadvantages K-Means Low time complexity; high computing efficiency Number of clusters needed to be preset; not suitable for
nonconvex dataBased on Hierarchy Suitable for the arbitrary data set; high scalability High time complexity; number of clusters needed to be preset SOM Diverse and developed models providing means to describe data adequately High time complexity; premise not completely correct; clustering result sensitive to the parameters of selected models TwoStep Improved BIRCH algorithm; automatically determined clustering numbers Medium computational efficiency for large-scale data; clustering algorithm cannot remerge or separate clusters to optimize
clustering results表 2 聚類評價指標
Table 2. Cluster evaluation index
Name Measure method or formula Compactness (CP) ${ {\overline{{\rm{CP}}} }_{{i} } }=\dfrac{1}{\left|{\varOmega }_{i}\right|}\displaystyle\sum _{ {x}_{i}\epsilon {\varOmega }_{i} }\parallel{x}_{i}-{w}_{i}\parallel $,
${\overline{ {\rm{CP} } } }=\dfrac{1}{K}\displaystyle\sum _{k=1}^{K}{\overline{\rm{CP}} }_{k}$Separation (SP) $\overline{{{\rm{SP}}} }=\dfrac{2}{ {k}^{2}-k}\displaystyle\sum _{i=1}^{k}\displaystyle\sum _{j=i+1}^{k}{\parallel{w}_{i}-{w}_{j}\parallel}_{2}$ Davies?Bouldin indicator (DBI) $\mathrm{D}\mathrm{B}\mathrm{I}=\dfrac{1}{k}\displaystyle\sum _{i=1}^{k}\underset{j\ne i}{\mathrm{max} }\left(\dfrac{\stackrel{-}{ {C}_{{i} } }+\stackrel{-}{ {C}_{{j} } } }{ {\parallel{w}_{i}-{w}_{j}\parallel}_{2} }\right)$ Dunn indicator (DI) $\mathrm{D}\mathrm{I}=\dfrac{\underset{0 < m\ne n < k}{\mathrm{min} }\left\{\left.\underset{\forall {x}_{i}\in {\varOmega }_{m},\forall {x}_{j}\in {\varOmega }_{n} }{\mathrm{min} }\left\{\left. \parallel {x}_{i}-{x}_{j}\parallel \right\}\right.\right\}\right.}{\underset{0 < m\leqslant K}{\mathrm{max} }\underset{\forall {x}_{i},{x}_{j}\in {\varOmega }_{m} }{\mathrm{max} }\left\{\left. \parallel{x}_{i}-{x}_{j}\parallel \right\}\right.}$ Silhouette coefficient Evaluate the clustering result based on the average distance between a data point and other data points in the same cluster and the average distance among various clusters, while the number of data samples among various clusters is almost the same. Notes: (1) ${\varOmega }_{i}$ stands for a collection representing a certain type of data in all clusters; (2) K stands for the total number of clusters; (3)$ k $ stands for the number of clusters; (4) $ {x}_{i},{x}_{j} $ stand for the different data point in the cluster; (5) $ {w}_{i},{w}_{j} $ stand for the different center of various clusters; (6) $ \parallel{x}_{i}-{w}_{i}\parallel $ stands for the distance from the data point to the center of a cluster; (7)$ {\parallel{w}_{i}-{w}_{j}\parallel}_{2} $ stands for the distance among various clusters; (8) $ \stackrel{-}{{C}_{t{i}}},\stackrel{-}{{C}_{{j}}} $ stand for the different average distance of all data points in the same cluster; (9)$ m,n $ stand for the different cluster; (10)$ \parallel {x}_{i}-{x}_{j}\parallel $ stands for the distance among any two data points. www.77susu.com -
參考文獻
[1] Wan L D, Li Y, Song Z, et al. Operation techniques for keeping stable smooth & high output production of WISCO’s No 8 BF. Ironmaking, 2020, 39(4): 12萬利德, 李熠, 宋釗, 等. 武鋼8號高爐穩定順行高產操作特點. 煉鐵, 2020, 39(4):12 [2] Wang J, Xu H, Zhang P F, et al. Management for maintaining low consumption production in Baosteel’s No 4 BF. Ironmaking, 2020, 39(4): 1王俊, 徐輝, 張培峰, 等. 寶鋼4號高爐長期低耗生產管理. 煉鐵, 2020, 39(4):1 [3] Li X L, Liu D X, Jia C, et al. Multi-model control of blast furnace burden surface based on fuzzy SVM. Neurocomputing, 2015, 148: 209 doi: 10.1016/j.neucom.2013.09.067 [4] Jin F, Zhao J, Sheng C Y, et al. Causality diagram-based scheduling approach for blast furnace gas system. IEEE/CAA J Autom Sin, 2018, 5(2): 587 doi: 10.1109/JAS.2017.7510715 [5] Wang C L, Chen X Z, Hou Q W, et al. RCS measurement and SAR imaging verification based on blast furnace stock line. Chin J Eng, 2018, 40(8): 979王晨露, 陳先中, 侯慶文, 等. 基于高爐料線的RCS測量及SAR成像驗證. 工程科學學報, 2018, 40(8):979 [6] Shi L, Wen Y B, Zhao G S, et al. Recognition of blast furnace gas flow center distribution based on infrared image processing. J Iron Steel Res Int, 2016, 23(3): 203 doi: 10.1016/S1006-706X(16)30035-8 [7] Hua C C, Wu J H, Li J P, et al. Silicon content prediction and industrial analysis on blast furnace using support vector regression combined with clustering algorithms. Neural Comput Appl, 2017, 28(12): 4111 doi: 10.1007/s00521-016-2292-x [8] Chen K, Liang Y, Gao Z L, et al. Just-in-time correntropy soft sensor with noisy data for industrial silicon content prediction. Sensors, 2017, 17(8): 1830 doi: 10.3390/s17081830 [9] Liu D F, Zhang J, Fu Q. Deep learning prediction modeling of blast furnace condition based on principal component analysis of temperature field. Metall Ind Autom, 2021, 45(3): 42 doi: 10.3969/j.issn.1000-7059.2021.03.006劉代飛, 張吉, 付強. 基于溫度場主元分析的高爐爐況深度學習預測建模. 冶金自動化, 2021, 45(3):42 doi: 10.3969/j.issn.1000-7059.2021.03.006 [10] Wang Z Y, Jiang D W, Wang X D, et al. Prediction of blast furnace hot metal temperature based on support vector regression and extreme learning machine. Chin J Eng, 2021, 43(4): 569王振陽, 江德文, 王新東, 等. 基于支持向量回歸與極限學習機的高爐鐵水溫度預測. 工程科學學報, 2021, 43(4):569 [11] Fontes D O L, Vasconcelos L G S, Brito R P. Blast furnace hot metal temperature and silicon content prediction using soft sensor based on fuzzy C-means and exogenous nonlinear autoregressive models. Comput Chem Eng, 2020, 141: 107028 doi: 10.1016/j.compchemeng.2020.107028 [12] Zhao J, Li X, Liu S, et al. Prediction of hot metal temperature based on data mining. High Temp Mater Process, 2021, 40(1): 87 doi: 10.1515/htmp-2021-0020 [13] Chen L K. Analysis of intelligent control system used for high efficiency smelting in blast furnace. Metall Ind Autom, 2021, 45(3): 2陳令坤. 基于高效冶煉的高爐智能控制系統分析. 冶金自動化, 2021, 45(3):2 [14] Yan B J, Zhang J L, Guo H W, et al. Evaluation of blast furnace operation profile based on principal component analysis(PCA). J Northeast Univ Nat Sci, 2015, 36(7): 952 doi: 10.3969/j.issn.1005-3026.2015.07.009閆炳基, 張建良, 國宏偉, 等. 基于主成分分析的高爐操作爐型評價. 東北大學學報(自然科學版), 2015, 36(7):952 doi: 10.3969/j.issn.1005-3026.2015.07.009 [15] Cao Y J, Zhang J L, Guo H W, et al. Clustering analysis and application of operative profile in Guofeng No 1 blast furnace based furance based on the algorithm of TwoStep. Iron Steel, 2013, 48(10): 17 doi: 10.13228/j.boyuan.issn0449-749x.2013.10.011曹英杰, 張建良, 國宏偉, 等. 基于TwoStep算法的國豐1號高爐操作爐型聚類分析與應用. 鋼鐵, 2013, 48(10):17 doi: 10.13228/j.boyuan.issn0449-749x.2013.10.011 [16] Chen L K, Li J. Development and application of blast furnace expert system with self-learning function based on pattern recognition. J Southeast Univ Nat Sci, 2012, 42(Suppl 1): 117陳令坤, 李佳. 基于模式識別的自學習型高爐冶煉專家系統的開發與應用. 東南大學學報(自然科學版), 2012, 42(增刊1): 117 [17] Wu S, Zhen X C, Guo H W. Application of data mining technique in blast furnace profile management // The 10th National Annual Conference on Enterprise Informatization and Industrial Engineering. Hangzhou, 2006: 268武森, 鄭錫村, 國宏偉. 數據挖掘技術在高爐爐型管理中的應用 // 全國第十屆企業信息化與工業工程學術年會論文集. 杭州, 2006:268 [18] García F A, Campoy P, Mochón J, et al. A new “user-friendly” blast furnace advisory control system using a neural network temperature profile classifier. ISIJ Int, 2010, 50(5): 730 doi: 10.2355/isijinternational.50.730 [19] Saxena C, Prasad S, Lavanya A, et al. Classification of above burden profile using SOM and k-means. Ironmak Steelmak, 2007, 34(1): 5 doi: 10.1179/174328106X149851 [20] Velmurugan T, Santhanam T. A survey of partition based clustering algorithms in data mining: An experimental approach. Inf Technol J, 2011, 10(3): 478 doi: 10.3923/itj.2011.478.484 [21] Carlsson G, Mémoli F. Characterization, stability and convergence of hierarchical clusteringmethods. J Mach Learn Res, 2010, 11: 1425 [22] Jiang B, Pei J, Tao Y F, et al. Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng, 2013, 25(4): 751 doi: 10.1109/TKDE.2011.221 [23] Al-Shammary D, Khalil I, Tari Z. A distributed aggregation and fast fractal clustering approach for SOAP traffic. J Netw Comput Appl, 2014, 41: 1 doi: 10.1016/j.jnca.2013.10.001 [24] McNicholas P D, Murphy T B. Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 2010, 26(21): 2705 doi: 10.1093/bioinformatics/btq498 [25] Xu D K, Tian Y J. A comprehensive survey of clustering algorithms. Ann Data Sci, 2015, 2(2): 165 doi: 10.1007/s40745-015-0040-1 [26] McKim C, Wu C R, Bovaird J A. Using graphical techniques from discriminant analysis to understand and interpret cluster solutions. Int J Data Anal Tech Strateg, 2017, 9(3): 189 doi: 10.1504/IJDATS.2017.086633 [27] Zhang H Y, Du W F, Wu L F. Split K-menas clustering algorithm based on density weighting. Comput Simul, 2021, 38(4): 254 doi: 10.3969/j.issn.1006-9348.2021.04.051張鴻雁, 杜文鋒, 武麗芬. 基于密度加權的分裂式K均值聚類算法. 計算機仿真, 2021, 38(4):254 doi: 10.3969/j.issn.1006-9348.2021.04.051 [28] Liu Y, Wu S, Zhou H H, et al. Research on optimization method based on K-means clustering algorithm. Inf Technol, 2019, 43(1): 66劉葉, 吳晟, 周海河, 等. 基于K-means聚類算法優化方法的研究. 信息技術, 2019, 43(1):66 [29] Yu X. Research on Consensus Clustering Algorithm Based on External Validation Measures [Dissertation]. Nanjing: Nanjing University of Finance & Economics, 2020于祥. 基于外部驗證指標的一致性聚類算法研究[學位論文]. 南京: 南京財經大學, 2020 [30] Estivill-Castro V. Why so many clustering algorithms: A position paper. Acm Sigkdd Explor Newsl, 2002, 4(1): 65 doi: 10.1145/568574.568575 [31] Guo J, Hou S. Study on the index of determining the optimal clustering number of K-means algorithm. Softw Guide, 2017, 16(11): 5郭靖, 侯蘇. K-means算法最佳聚類數評價指標研究. 軟件導刊, 2017, 16(11):5 [32] Gao X. Research on Improved K-Mens Algorithm and New Cluster Validity Index [Dissertation]. Hefei: Anhui University, 2020高新. 一種改進K-means聚類算法與新的聚類有效性指標研究[學位論文]. 合肥: 安徽大學, 2020 [33] Zhu L J, Ma B X, Zhao X Q. Clustering validity analysis based on silhouette coefficient. J Comput Appl, 2010, 30(Suppl 2): 139朱連江, 馬炳先, 趙學泉. 基于輪廓系數的聚類有效性分析. 計算機應用, 2010, 30(增刊2): 139 [34] Lv Z P, Ji H L. Three clustering validity analysis based on SPSS. Softw Guide, 2018, 17(11): 81呂正萍, 紀漢霖. 數種基于SPSS統計工具的聚類算法效率對比. 軟件導刊, 2018, 17(11):81 [35] Liu Q, Zhang P, Cheng S S, et al. Heat transfer and thermo-elastic analysis of copper steel composite stave. Int J Heat Mass Transf, 2016, 103: 341 doi: 10.1016/j.ijheatmasstransfer.2016.05.100 [36] Zhang H, Jiao K X, Zhang J L, et al. A new method for evaluating cooling capacity of blast furnace cooling stave. Ironmak Steelmak, 2019, 46(7): 671 doi: 10.1080/03019233.2018.1454388 -