基于不同算法的高爐操作爐型聚類效果對比

魯杰; 閆炳基; 趙偉; 李鵬; 陳棟; 國宏偉

doi:10.13374/j.issn2095-9389.2021.05.25.005

基于不同算法的高爐操作爐型聚類效果對比

doi: 10.13374/j.issn2095-9389.2021.05.25.005

蘇州大學沙鋼鋼鐵學院，蘇州 215137

基金項目: 國家自然科學基金資助項目（52074185，51774209）；蘇州市科技計劃項目（SYG202127）

詳細信息

通訊作者:
E-mail: bjyan@suda.edu.cn

中圖分類號: TF512
計量
- 文章訪問數: 572
- HTML全文瀏覽量: 200
- PDF下載量: 52
- 被引次數: 0
出版歷程
- 收稿日期: 2021-05-25
- 網絡出版日期: 2021-10-08
- 刊出日期: 2022-12-01

Comparison of the effect of various clustering algorithms on the furnace profile management

School of Iron and Steel, Soochow University, Suzhou 215137, China

More Information

Corresponding author: E-mail: bjyan@suda.edu.cn

摘要

摘要: 高爐操作爐型與高爐操作、技術經濟指標等關系密切，合理的操作爐型有利于保證高爐生產的優質、低耗、高產、長壽。通過對冷卻壁溫度的聚類分析，能夠有效合理地表征高爐操作爐型的變化，對高爐生產有著重要的指導意義。分別采用K-Means、TwoStep對數據集進行聚類分析，基于兩種聚類算法的原理，結合Davies?Bouldin index（DBI）與Dunn index（DI）對聚類結果進行評價，分析不同聚類算法間的差異，得出了在所選樣本數據及數據特征基礎上，K-Means算法聚類結果更好的結論，該研究可為高爐煉鐵大數據分析中的聚類算法選擇提供有力參考。
- 高爐操作爐型 /
- K均值聚類算法 /
- 兩步聚類算法 /
- 聚類評價指標 /
- 大數據
Abstract: The blast furnace operation profile is closely related to the operation, technical and economic indicators of a blast furnace. A reasonable furnace operation profile ensures high-quality hot metal, low fuel consumption, high yield, and furnace longevity. To guide the blast furnace ironmaking, cluster analysis of the stave temperature is implemented to effectively characterize the changes in the furnace operation profile. The K-Means, TwoStep, and hierarchical clustering algorithms are often used to monitor the blast furnace operation profile. The present study also shows that various clustering algorithms can help manage the blast furnace operation profile. However, the difference among the clustering results from these algorithms remains unclear. Based on the previous research, this paper compared the clustering principles and research status with various algorithms and selected two algorithms of K-Means and TwoStep, which were more applicable and compatible with the algorithm principles. The K-Means algorithm is a typical partition-based clustering algorithm with low time complexity, high clustering efficiency, and good clustering quality. It has been widely used in cluster analysis of the blast furnace operation profile. Additionally, domestic scholars had given effective improvement measures for the shortcomings of sensitivity to the initial center and requirements for data distribution. The TwoStep algorithm was an improved BIRCH (Balanced iterative reducing and clustering using hierarchies) algorithm, which reduced time complexity and can automatically determine the optimal number of clusters. The authors of this article considered the problem that indicators for evaluating the furnace operation profile were multiple and largely overlapped. Principal Component Analysis was introduced based on the TwoStep algorithm. Three new core indicators were generated from the traditional evaluation indicators for the clustering results of the furnace operation profile. For blast furnace operation profile monitoring and management, three core indicators also showed improved performance. In this paper, K-Means and TwoStep were used to cluster the data set. Based on the principles of these algorithms and combined with the Davies?Bouldin index and Dunn index, the clustering results were analyzed to judge the difference between the two clustering algorithms. The analysis based on the sample data and data characteristics selected in this article revealed that the K-Means algorithm achieved better clustering results than TwoStep. This research can provide a powerful reference for selection among various clustering algorithms in blast furnace ironmaking big data analysis.
- furnace profile management /
- K-Means /
- TwoStep /
- cluster evaluation index /
- big data

HTML全文

圖 1 高爐各段冷卻壁位置示意圖

Figure 1. Position of a cooling stave in each section of a blast furnace

下載: 全尺寸圖片幻燈片

圖 2 不同聚類簇數的DBI和DI指標結果. (a) DBI評價指標； (b) DI評價指標

Figure 2. Result calculation of a cluster evaluation index for various numbers of clusters: (a) Davies-Bouldin index； (b) Dunn validity index

下載: 全尺寸圖片幻燈片

圖 3 K-Means聚類結果中6類爐型冷卻壁各段溫度分布

Figure 3. Temperature distribution of each cooling stave of six furnace profiles by K-Means clustering algorithm

下載: 全尺寸圖片幻燈片

圖 4 TwoStep聚類結果中6類爐型冷卻壁各段溫度分布

Figure 4. Temperature distribution of each cooling stave of six furnace profiles by TwoStep clustering algorithm

下載: 全尺寸圖片幻燈片

圖 5 TwoStep聚類結果中簇數為6、7時數據分布

Figure 5. Data distribution when the numbers of clusters are six and seven by TwoStep clustering algorithm

下載: 全尺寸圖片幻燈片

圖 6 K-Means聚類結果中簇數為6時數據分布

Figure 6. Data distribution when the number of clusters is six by K-Means clustering algorithm

下載: 全尺寸圖片幻燈片

圖 7 K-Means、TwoStep聚類結果（簇數為6）

Figure 7. K-Means, TwoStep clustering results (number of clusters is 6)

下載: 全尺寸圖片幻燈片

表 1 聚類算法分類及特點

Table 1. Classification and characteristics of clustering algorithms

Clustering algorithms	Advantages	Disadvantages
K-Means	Low time complexity; high computing efficiency	Number of clusters needed to be preset; not suitable for nonconvex data
Based on Hierarchy	Suitable for the arbitrary data set; high scalability	High time complexity; number of clusters needed to be preset
SOM	Diverse and developed models providing means to describe data adequately	High time complexity; premise not completely correct; clustering result sensitive to the parameters of selected models
TwoStep	Improved BIRCH algorithm; automatically determined clustering numbers	Medium computational efficiency for large-scale data; clustering algorithm cannot remerge or separate clusters to optimize clustering results

下載: 導出CSV

表 2 聚類評價指標

Table 2. Cluster evaluation index

Name	Measure method or formula
Compactness (CP)	${ {\overline{{\rm{CP}}} }_{{i} } }=\dfrac{1}{\left\|{\varOmega }_{i}\right\|}\displaystyle\sum _{ {x}_{i}\epsilon {\varOmega }_{i} }\parallel{x}_{i}-{w}_{i}\parallel $， ${\overline{ {\rm{CP} } } }=\dfrac{1}{K}\displaystyle\sum _{k=1}^{K}{\overline{\rm{CP}} }_{k}$
Separation (SP)	$\overline{{{\rm{SP}}} }=\dfrac{2}{ {k}^{2}-k}\displaystyle\sum _{i=1}^{k}\displaystyle\sum _{j=i+1}^{k}{\parallel{w}_{i}-{w}_{j}\parallel}_{2}$
Davies?Bouldin indicator (DBI)	$\mathrm{D}\mathrm{B}\mathrm{I}=\dfrac{1}{k}\displaystyle\sum _{i=1}^{k}\underset{j\ne i}{\mathrm{max} }\left(\dfrac{\stackrel{-}{ {C}_{{i} } }+\stackrel{-}{ {C}_{{j} } } }{ {\parallel{w}_{i}-{w}_{j}\parallel}_{2} }\right)$
Dunn indicator (DI)	$\mathrm{D}\mathrm{I}=\dfrac{\underset{0 < m\ne n < k}{\mathrm{min} }\left\{\left.\underset{\forall {x}_{i}\in {\varOmega }_{m},\forall {x}_{j}\in {\varOmega }_{n} }{\mathrm{min} }\left\{\left. \parallel {x}_{i}-{x}_{j}\parallel \right\}\right.\right\}\right.}{\underset{0 < m\leqslant K}{\mathrm{max} }\underset{\forall {x}_{i},{x}_{j}\in {\varOmega }_{m} }{\mathrm{max} }\left\{\left. \parallel{x}_{i}-{x}_{j}\parallel \right\}\right.}$
Silhouette coefficient	Evaluate the clustering result based on the average distance between a data point and other data points in the same cluster and the average distance among various clusters, while the number of data samples among various clusters is almost the same.
Notes: (1) ${\varOmega }_{i}$ stands for a collection representing a certain type of data in all clusters; (2) K stands for the total number of clusters; (3)$ k $ stands for the number of clusters; (4) $ {x}_{i},{x}_{j} $ stand for the different data point in the cluster; (5) $ {w}_{i},{w}_{j} $ stand for the different center of various clusters; (6) $ \parallel{x}_{i}-{w}_{i}\parallel $ stands for the distance from the data point to the center of a cluster; (7)$ {\parallel{w}_{i}-{w}_{j}\parallel}_{2} $ stands for the distance among various clusters; (8) $ \stackrel{-}{{C}_{t{i}}},\stackrel{-}{{C}_{{j}}} $ stand for the different average distance of all data points in the same cluster; (9)$ m,n $ stand for the different cluster; (10)$ \parallel {x}_{i}-{x}_{j}\parallel $ stands for the distance among any two data points.

下載: 導出CSV

www.77susu.com

參考文獻(36)

[1]	Wan L D, Li Y, Song Z, et al. Operation techniques for keeping stable smooth & high output production of WISCO’s No 8 BF. Ironmaking, 2020, 39(4): 12 萬利德, 李熠, 宋釗, 等. 武鋼8號高爐穩定順行高產操作特點. 煉鐵, 2020, 39(4):12
[2]	Wang J, Xu H, Zhang P F, et al. Management for maintaining low consumption production in Baosteel’s No 4 BF. Ironmaking, 2020, 39(4): 1 王俊, 徐輝, 張培峰, 等. 寶鋼4號高爐長期低耗生產管理. 煉鐵, 2020, 39(4):1
[3]	Li X L, Liu D X, Jia C, et al. Multi-model control of blast furnace burden surface based on fuzzy SVM. Neurocomputing, 2015, 148: 209 doi: 10.1016/j.neucom.2013.09.067
[4]	Jin F, Zhao J, Sheng C Y, et al. Causality diagram-based scheduling approach for blast furnace gas system. IEEE/CAA J Autom Sin, 2018, 5(2): 587 doi: 10.1109/JAS.2017.7510715
[5]	Wang C L, Chen X Z, Hou Q W, et al. RCS measurement and SAR imaging verification based on blast furnace stock line. Chin J Eng, 2018, 40(8): 979 王晨露, 陳先中, 侯慶文, 等. 基于高爐料線的RCS測量及SAR成像驗證. 工程科學學報, 2018, 40(8):979
[6]	Shi L, Wen Y B, Zhao G S, et al. Recognition of blast furnace gas flow center distribution based on infrared image processing. J Iron Steel Res Int, 2016, 23(3): 203 doi: 10.1016/S1006-706X(16)30035-8
[7]	Hua C C, Wu J H, Li J P, et al. Silicon content prediction and industrial analysis on blast furnace using support vector regression combined with clustering algorithms. Neural Comput Appl, 2017, 28(12): 4111 doi: 10.1007/s00521-016-2292-x
[8]	Chen K, Liang Y, Gao Z L, et al. Just-in-time correntropy soft sensor with noisy data for industrial silicon content prediction. Sensors, 2017, 17(8): 1830 doi: 10.3390/s17081830
[9]	Liu D F, Zhang J, Fu Q. Deep learning prediction modeling of blast furnace condition based on principal component analysis of temperature field. Metall Ind Autom, 2021, 45(3): 42 doi: 10.3969/j.issn.1000-7059.2021.03.006 劉代飛, 張吉, 付強. 基于溫度場主元分析的高爐爐況深度學習預測建模. 冶金自動化, 2021, 45(3):42 doi: 10.3969/j.issn.1000-7059.2021.03.006
[10]	Wang Z Y, Jiang D W, Wang X D, et al. Prediction of blast furnace hot metal temperature based on support vector regression and extreme learning machine. Chin J Eng, 2021, 43(4): 569 王振陽, 江德文, 王新東, 等. 基于支持向量回歸與極限學習機的高爐鐵水溫度預測. 工程科學學報, 2021, 43(4):569
[11]	Fontes D O L, Vasconcelos L G S, Brito R P. Blast furnace hot metal temperature and silicon content prediction using soft sensor based on fuzzy C-means and exogenous nonlinear autoregressive models. Comput Chem Eng, 2020, 141: 107028 doi: 10.1016/j.compchemeng.2020.107028
[12]	Zhao J, Li X, Liu S, et al. Prediction of hot metal temperature based on data mining. High Temp Mater Process, 2021, 40(1): 87 doi: 10.1515/htmp-2021-0020
[13]	Chen L K. Analysis of intelligent control system used for high efficiency smelting in blast furnace. Metall Ind Autom, 2021, 45(3): 2 陳令坤. 基于高效冶煉的高爐智能控制系統分析. 冶金自動化, 2021, 45(3):2
[14]	Yan B J, Zhang J L, Guo H W, et al. Evaluation of blast furnace operation profile based on principal component analysis(PCA). J Northeast Univ Nat Sci, 2015, 36(7): 952 doi: 10.3969/j.issn.1005-3026.2015.07.009 閆炳基, 張建良, 國宏偉, 等. 基于主成分分析的高爐操作爐型評價. 東北大學學報(自然科學版), 2015, 36(7):952 doi: 10.3969/j.issn.1005-3026.2015.07.009
[15]	Cao Y J, Zhang J L, Guo H W, et al. Clustering analysis and application of operative profile in Guofeng No 1 blast furnace based furance based on the algorithm of TwoStep. Iron Steel, 2013, 48(10): 17 doi: 10.13228/j.boyuan.issn0449-749x.2013.10.011 曹英杰, 張建良, 國宏偉, 等. 基于TwoStep算法的國豐1號高爐操作爐型聚類分析與應用. 鋼鐵, 2013, 48(10):17 doi: 10.13228/j.boyuan.issn0449-749x.2013.10.011
[16]	Chen L K, Li J. Development and application of blast furnace expert system with self-learning function based on pattern recognition. J Southeast Univ Nat Sci, 2012, 42(Suppl 1): 117 陳令坤, 李佳. 基于模式識別的自學習型高爐冶煉專家系統的開發與應用. 東南大學學報(自然科學版), 2012, 42(增刊1): 117
[17]	Wu S, Zhen X C, Guo H W. Application of data mining technique in blast furnace profile management // The 10th National Annual Conference on Enterprise Informatization and Industrial Engineering. Hangzhou, 2006: 268 武森, 鄭錫村, 國宏偉. 數據挖掘技術在高爐爐型管理中的應用 // 全國第十屆企業信息化與工業工程學術年會論文集. 杭州, 2006:268
[18]	García F A, Campoy P, Mochón J, et al. A new “user-friendly” blast furnace advisory control system using a neural network temperature profile classifier. ISIJ Int, 2010, 50(5): 730 doi: 10.2355/isijinternational.50.730
[19]	Saxena C, Prasad S, Lavanya A, et al. Classification of above burden profile using SOM and k-means. Ironmak Steelmak, 2007, 34(1): 5 doi: 10.1179/174328106X149851
[20]	Velmurugan T, Santhanam T. A survey of partition based clustering algorithms in data mining: An experimental approach. Inf Technol J, 2011, 10(3): 478 doi: 10.3923/itj.2011.478.484
[21]	Carlsson G, Mémoli F. Characterization, stability and convergence of hierarchical clusteringmethods. J Mach Learn Res, 2010, 11: 1425
[22]	Jiang B, Pei J, Tao Y F, et al. Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng, 2013, 25(4): 751 doi: 10.1109/TKDE.2011.221
[23]	Al-Shammary D, Khalil I, Tari Z. A distributed aggregation and fast fractal clustering approach for SOAP traffic. J Netw Comput Appl, 2014, 41: 1 doi: 10.1016/j.jnca.2013.10.001
[24]	McNicholas P D, Murphy T B. Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 2010, 26(21): 2705 doi: 10.1093/bioinformatics/btq498
[25]	Xu D K, Tian Y J. A comprehensive survey of clustering algorithms. Ann Data Sci, 2015, 2(2): 165 doi: 10.1007/s40745-015-0040-1
[26]	McKim C, Wu C R, Bovaird J A. Using graphical techniques from discriminant analysis to understand and interpret cluster solutions. Int J Data Anal Tech Strateg, 2017, 9(3): 189 doi: 10.1504/IJDATS.2017.086633
[27]	Zhang H Y, Du W F, Wu L F. Split K-menas clustering algorithm based on density weighting. Comput Simul, 2021, 38(4): 254 doi: 10.3969/j.issn.1006-9348.2021.04.051 張鴻雁, 杜文鋒, 武麗芬. 基于密度加權的分裂式K均值聚類算法. 計算機仿真, 2021, 38(4):254 doi: 10.3969/j.issn.1006-9348.2021.04.051
[28]	Liu Y, Wu S, Zhou H H, et al. Research on optimization method based on K-means clustering algorithm. Inf Technol, 2019, 43(1): 66 劉葉, 吳晟, 周海河, 等. 基于K-means聚類算法優化方法的研究. 信息技術, 2019, 43(1):66
[29]	Yu X. Research on Consensus Clustering Algorithm Based on External Validation Measures [Dissertation]. Nanjing: Nanjing University of Finance & Economics, 2020 于祥. 基于外部驗證指標的一致性聚類算法研究[學位論文]. 南京: 南京財經大學, 2020
[30]	Estivill-Castro V. Why so many clustering algorithms: A position paper. Acm Sigkdd Explor Newsl, 2002, 4(1): 65 doi: 10.1145/568574.568575
[31]	Guo J, Hou S. Study on the index of determining the optimal clustering number of K-means algorithm. Softw Guide, 2017, 16(11): 5 郭靖, 侯蘇. K-means算法最佳聚類數評價指標研究. 軟件導刊, 2017, 16(11):5
[32]	Gao X. Research on Improved K-Mens Algorithm and New Cluster Validity Index [Dissertation]. Hefei: Anhui University, 2020 高新. 一種改進K-means聚類算法與新的聚類有效性指標研究[學位論文]. 合肥: 安徽大學, 2020
[33]	Zhu L J, Ma B X, Zhao X Q. Clustering validity analysis based on silhouette coefficient. J Comput Appl, 2010, 30(Suppl 2): 139 朱連江, 馬炳先, 趙學泉. 基于輪廓系數的聚類有效性分析. 計算機應用, 2010, 30(增刊2): 139
[34]	Lv Z P, Ji H L. Three clustering validity analysis based on SPSS. Softw Guide, 2018, 17(11): 81 呂正萍, 紀漢霖. 數種基于SPSS統計工具的聚類算法效率對比. 軟件導刊, 2018, 17(11):81
[35]	Liu Q, Zhang P, Cheng S S, et al. Heat transfer and thermo-elastic analysis of copper steel composite stave. Int J Heat Mass Transf, 2016, 103: 341 doi: 10.1016/j.ijheatmasstransfer.2016.05.100
[36]	Zhang H, Jiao K X, Zhang J L, et al. A new method for evaluating cooling capacity of blast furnace cooling stave. Ironmak Steelmak, 2019, 46(7): 671 doi: 10.1080/03019233.2018.1454388