<span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
<span id="fpn9h"><noframes id="fpn9h">
<th id="fpn9h"></th>
<strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
<th id="fpn9h"><noframes id="fpn9h">
<span id="fpn9h"><video id="fpn9h"></video></span>
<ruby id="fpn9h"></ruby>
<strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
  • 《工程索引》(EI)刊源期刊
  • 中文核心期刊
  • 中國科技論文統計源期刊
  • 中國科學引文數據庫來源期刊

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

基于空間近鄰關系的非平衡數據重采樣算法

李睿峰 李文海 孫艷麗 吳陽勇

李睿峰, 李文海, 孫艷麗, 吳陽勇. 基于空間近鄰關系的非平衡數據重采樣算法[J]. 工程科學學報, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
引用本文: 李睿峰, 李文海, 孫艷麗, 吳陽勇. 基于空間近鄰關系的非平衡數據重采樣算法[J]. 工程科學學報, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
LI Rui-feng, LI Wen-hai, SUN Yan-li, WU Yang-yong. Resampling algorithm for imbalanced data based on their neighbor relationship[J]. Chinese Journal of Engineering, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002
Citation: LI Rui-feng, LI Wen-hai, SUN Yan-li, WU Yang-yong. Resampling algorithm for imbalanced data based on their neighbor relationship[J]. Chinese Journal of Engineering, 2021, 43(6): 862-869. doi: 10.13374/j.issn2095-9389.2020.04.05.002

基于空間近鄰關系的非平衡數據重采樣算法

doi: 10.13374/j.issn2095-9389.2020.04.05.002
基金項目: 軍內科研項目“新一代航空電子裝備測試關鍵技術研究”資助項目(4172122113R)
詳細信息
    通訊作者:

    E-mail:dongzhi1110@foxmail.com

  • 中圖分類號: TP206.1

Resampling algorithm for imbalanced data based on their neighbor relationship

More Information
  • 摘要: 為了提高非平衡數據集的分類精度,提出了一種基于樣本空間近鄰關系的重采樣算法。該方法首先根據數據集中少數類樣本的空間近鄰關系進行安全級別評估,根據安全級別有指導的采用合成少數類過采樣技術(Synthetic minority oversampling technique,SMOTE)進行升采樣;然后對多數類樣本依據其空間近鄰關系計算局部密度,從而對多數類樣本密集區域進行降采樣處理。通過以上兩種手段可以均衡測試數據集,并控制數據規模防止過擬合,實現對兩類樣本分類的均衡化。采用十折交叉驗證的方式產生訓練集和測試集,在對訓練集重采樣之后,以核超限學習機作為分類器進行訓練,并在測試集上進行驗證。在UCI非平衡數據集和電路故障診斷實測數據上的實驗結果表明,所提方法在整體上優于其他重采樣算法。

     

  • 圖  1  RBNR算法流程圖

    Figure  1.  Flowchart of the RBNR algorithm

    圖  2  串聯穩壓電路

    Figure  2.  Serial regulating circuit

    圖  3  測試環境圖

    Figure  3.  Testing environment

    圖  4  BMS算法參數分析。(a)RC值分析;(b)F-valve值分析;(c)G-mean值分析

    Figure  4.  Parameter analysis of BMS: (a) analysis of the RC; (b) analysis of the F-valve; (c) analysis of the G-mean

    圖  5  結果對比柱狀圖。(a)RC值對比;(b)F-value值對比;(c)G-mean值對比

    Figure  5.  Bar graph of result comparison: (a) comparison of RC; (b) comparison of F-value; (c) comparison of G-mean

    表  1  混淆矩陣

    Table  1.   Confusion matrix

    CategoryClassified as minorityClassified as majority
    MinorityTPFN
    MajorityFPTN
    下載: 導出CSV

    表  2  選用的UCI數據集

    Table  2.   UCI data set

    Data setDimensionMinority /majorityImbalance ratio
    CTG21176/16551:9.403
    Diabetes8268/5001:1.866
    Glass942/1721:4.095
    Wine1348/1301:2.708
    下載: 導出CSV

    表  3  電路實測數據(部分)

    Table  3.   Some circuit measured data

    IDV1_max/VV1_min/VV2/VV3/VV4/VV5/VV6/VV7/VV8/VAttribute
    1?7.730?6.360?6.923?6.928?6.281?2.811?2.981?5.579?0.140normal
    2?7.794?6.337?6.953?6.955?6.297?2.781?2.969?5.603?0.134
    ……
    188?7.706?6.344?6.943?6.945?6.271?2.812?3.020?5.613?0.148
    189?7.760?6.622?7.106?7.089?6.533?2.656?2.456?4.548?0.133faulty
    ……
    233?7.792?6.597?7.078?7.049?6.503?2.670?2.544?4.726?0.113
    下載: 導出CSV

    表  4  F-value和G-mean性能比較

    Table  4.   Comparison between the F-value and G-mean

    Data setAlgorithmRC F-value G-mean Parameter value
    MeanStd MeanStd MeanStd Cσ
    CTGSMOTE10 0.97140.0782 0.99760.0045 0.14.9849
    RU-SMOTE10 0.98490.0389 0.99840.0034 14.9056
    BMS0.99830.0118 0.98250.0342 0.99720.0068 15.0038
    RBNR10 0.98700.0382 0.99880.0030 15.0123
    DiabetesSMOTE0.69660.0852 0.65150.0694 0.73180.0486 12.7590
    RU-SMOTE0.57750.1121 0.63300.0830 0.70790.0670 13.3938
    BMS0.66560.1102 0.65950.0801 0.73570.0652 0.13.0312
    RBNR0.78710.0895 0.68320.0624 0.75540.0497 0.13.0156
    GlassSMOTE0.89850.1529 0.89020.1125 0.93190.0865 101.2357
    RU-SMOTE0.85230.1934 0.86080.1266 0.89150.1558 101.2156
    BMS0.86560.2157 0.89090.1371 0.90620.1670 103.3978
    RBNR0.90860.1295 0.90620.0996 0.94160.0693 11.4562
    WineSMOTE10 0.98180.0513 0.99490.0152 103.9758
    RU-SMOTE10 0.97700.0507 0.99140.0181 103.6135
    BMS0.99710.0202 0.96000.0827 0.98740.0230 1004.0360
    RBNR10 0.97890.0454 0.99190.0146 103.7833
    RegulatorSMOTE0.92720.1303 0.84960.1067 0.93140.0715 10001.5781
    RU-SMOTE0.93200.2114 0.83040.1118 0.89990.1931 104.7342
    BMS0.86850.1930 0.87310.1007 0.90250.1526 0.013.6821
    RBNR0.90750.1248 0.89470.1043 0.93610.0699 104.6943
    下載: 導出CSV
    <span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    <span id="fpn9h"><noframes id="fpn9h">
    <th id="fpn9h"></th>
    <strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
    <th id="fpn9h"><noframes id="fpn9h">
    <span id="fpn9h"><video id="fpn9h"></video></span>
    <ruby id="fpn9h"></ruby>
    <strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    www.77susu.com
  • [1] Chen S, He H B, Garcia E A. RAMOBoost: Ranked minority oversampling in boosting. IEEE Trans Neural Networks, 2010, 21(10): 1624 doi: 10.1109/TNN.2010.2066988
    [2] Xiao Y C, Wang H G, Zhang L, et al. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Syst, 2014, 59: 75 doi: 10.1016/j.knosys.2014.01.020
    [3] Miao Z M, Zhao L W, Yuan W W, et al. Multi-class imbalanced learning implemented in network intrusion detection // 2011 International Conference on Computer Science and Service System (CSSS). Nanjing, 2011: 1395
    [4] Smailovi? J, Gr?ar M, Lavra? N, et al. Stream-based active learning for sentiment analysis in the financial domain. Inform Sci, 2014, 285: 181 doi: 10.1016/j.ins.2014.04.034
    [5] Liu Y Q, Wang C, Zhang L. Decision tree based predictive models for breast cancer survivability on imbalanced data // 2009 3rd International Conference on Bioinformatics and Biomedical Engineering. Beijing, 2009: 1
    [6] Gao M Z, Xu A Q, Xu Q. Fault detection method of electronic equipment based on SL-SMOTE and CS-RVM. Comput Eng Appl, 2019, 55(4): 185 doi: 10.3778/j.issn.1002-8331.1708-0032

    高明哲, 許愛強, 許晴. SL-SMOTE和CS-RVM結合的電子設備故障檢測方法. 計算機工程與應用, 2019, 55(4):185 doi: 10.3778/j.issn.1002-8331.1708-0032
    [7] Feng H W, Yao B, Gao Y, et al. Imbalanced data processing algorithm based on boundary mixed sampling. Control Decis, 2017, 32(10): 1831

    馮宏偉, 姚博, 高原, 等. 基于邊界混合采樣的非均衡數據處理算法. 控制與決策, 2017, 32(10):1831
    [8] Gao M, Hong X, Chen S, et al. A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing, 2011, 74(17): 3456 doi: 10.1016/j.neucom.2011.06.010
    [9] Gu P, Ouyang Y Y. Classification research for unbalanced data based on mixed-sampling. Appl Res Comput, 2015, 32(2): 379 doi: 10.3969/j.issn.1001-3695.2015.02.014

    古平, 歐陽源遊. 基于混合采樣的非平衡數據集分類研究. 計算機應用研究, 2015, 32(2):379 doi: 10.3969/j.issn.1001-3695.2015.02.014
    [10] Yu H L, Yang X B, Zheng S, et al. Active learning from imbalanced data: A solution of online weighted extreme learning machine. IEEE Trans Neural Networks Learn Syst, 2019, 30(4): 1088 doi: 10.1109/TNNLS.2018.2855446
    [11] Cai Y Y, Song X D. New fuzzy SVM model used in imbalanced datasets. J Xidian Univ Nat Sci, 2015, 42(5): 120

    蔡艷艷, 宋曉東. 針對非平衡數據分類的新型模糊SVM模型. 西安電子科技大學學報(自然科學版), 2015, 42(5):120
    [12] Wang C Y, Su H Y, Qu Y, et al. Imbalanced data sets classification method based on over-sampling technique. Comput Eng Appl, 2011, 47(1): 139 doi: 10.3778/j.issn.1002-8331.2011.01.038

    王春玉, 蘇宏業, 渠瑜, 等. 一種基于過抽樣技術的非平衡數據集分類方法. 計算機工程與應用, 2011, 47(1):139 doi: 10.3778/j.issn.1002-8331.2011.01.038
    [13] Zhang Y F, Guo H P, Zhi W M, et al. An ensemble pruning method for imbalanced data classification. Comput Eng, 2014, 40(6): 157 doi: 10.3969/j.issn.1000-3428.2014.06.034

    張銀峰, 郭華平, 職為梅, 等. 一種面向不平衡數據分類的組合剪枝方法. 計算機工程, 2014, 40(6):157 doi: 10.3969/j.issn.1000-3428.2014.06.034
    [14] Vong C M, Ip W F, Wong P K, et al. Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing, 2014, 128: 136 doi: 10.1016/j.neucom.2012.11.056
    [15] Zhai Y, Yang B R, Wang S P, et al. Under-sampling method based on cooperative co-evolutionary mechanism. J Univ Sci Technol Beijing, 2011, 33(12): 1550

    翟云, 楊炳儒, 王樹鵬, 等. 基于協同進化機制的欠采樣方法. 北京科技大學學報, 2011, 33(12):1550
    [16] Yang Y, Liu F, Jin Z Y, et al. Aliasing artefact suppression in compressed sensing MRI for random phase-encode undersampling. IEEE Trans Bio-Med Eng, 2015, 62(9): 2215 doi: 10.1109/TBME.2015.2419372
    [17] Jia C Z, Zuo Y. S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theoret Biol, 2017, 422: 84 doi: 10.1016/j.jtbi.2017.03.031
    [18] Wilson D L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern, 2007, SMC-2(3): 408
    [19] Zhao Z X, Wang G L, Li X D. An improved SVM based under-sampling method for classifying imbalanced data. Acta Sci Nat Univ Sunyatseni, 2012, 51(6): 10

    趙自翔, 王廣亮, 李曉東. 基于支持向量機的不平衡數據分類的改進欠采樣方法. 中山大學學報(自然科學版), 2012, 51(6):10
    [20] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321 doi: 10.1613/jair.953
    [21] Liu Y X, Liu S M, Liu T, et al. New oversampling algorithm DB_SMOTE. Comput Eng Appl, 2014, 50(6): 92 doi: 10.3778/j.issn.1002-8331.1308-0099

    劉余霞, 劉三民, 劉濤, 等. 一種新的過采樣算法DB_SMOTE. 計算機工程與應用, 2014, 50(6):92 doi: 10.3778/j.issn.1002-8331.1308-0099
    [22] Gu Q, Yuan L, Ning B, et al. A novel classification algorithm for imbalanced datasets based on hybrid resampling strategy. Comput Eng Sci, 2012, 34(10): 128 doi: 10.3969/j.issn.1007-130X.2012.09.024

    谷瓊, 袁磊, 寧彬, 等. 一種基于混合重取樣策略的非均衡數據集分類算法. 計算機工程與科學, 2012, 34(10):128 doi: 10.3969/j.issn.1007-130X.2012.09.024
    [23] Tao X M, Hao S Y, Zhang D X, et al. Support vector machine for unbalanced data based on sample properties under-sampling approaches. Control Decis, 2013, 28(7): 978

    陶新民, 郝思媛, 張冬雪, 等. 基于樣本特性欠取樣的不均衡支持向量機. 控制與決策, 2013, 28(7):978
    [24] Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem // Proceedings of Advances in Knowledge Discovery and Data Mining Conference. Bangkok, 2009: 475
    [25] Huang G B, Zhou H M, Ding X J, et al. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern, 2012, 42(2): 513 doi: 10.1109/TSMCB.2011.2168604
    [26] Gautam C, Tiwari A, Leng Q. On the construction of extreme learning machine for online and offline one-class classification-an expanded toolbox. Neurocomputing, 2017, 261: 126 doi: 10.1016/j.neucom.2016.04.070
    [27] Zhu M, Liu Q, Liu X, et al. Fault detection method for avionics based on LMK and OC-ELM. Syst Eng Electron, 2020, 42(6): 1424 doi: 10.3969/j.issn.1001-506X.2020.06.29

    朱敏, 劉奇, 劉星, 等. 基于LMK和OC-ELM的航空電子部件故障檢測方法. 系統工程與電子技術, 2020, 42(6):1424 doi: 10.3969/j.issn.1001-506X.2020.06.29
    [28] Xue L X, Qiu B Z. Boundary points detection algorithm based on coefficient of variation. Pattern Recognit Artif Intell, 2009, 22(5): 799 doi: 10.3969/j.issn.1003-6059.2009.05.020

    薛麗香, 邱保志. 基于變異系數的邊界點檢測算法. 模式識別與人工智能, 2009, 22(5):799 doi: 10.3969/j.issn.1003-6059.2009.05.020
    [29] Zhang Z, Duan Z M, Long Y. Fault detection in switched current circuits based on preferred wavelet packet. Chin J Eng, 2017, 39(7): 1101

    張鎮, 段哲民, 龍英. 基于小波包的開關電流電路故障診斷. 工程科學學報, 2017, 39(7):1101
  • 加載中
圖(5) / 表(4)
計量
  • 文章訪問數:  1547
  • HTML全文瀏覽量:  726
  • PDF下載量:  66
  • 被引次數: 0
出版歷程
  • 收稿日期:  2020-04-05
  • 刊出日期:  2021-06-25

目錄

    /

    返回文章
    返回