<span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
<span id="fpn9h"><noframes id="fpn9h">
<th id="fpn9h"></th>
<strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
<th id="fpn9h"><noframes id="fpn9h">
<span id="fpn9h"><video id="fpn9h"></video></span>
<ruby id="fpn9h"></ruby>
<strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
  • 《工程索引》(EI)刊源期刊
  • 中文核心期刊
  • 中國科技論文統計源期刊
  • 中國科學引文數據庫來源期刊

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

差分隱私保護的隨機森林算法及在鋼材料上的應用

陳薛輝 馮燕 錢權

陳薛輝, 馮燕, 錢權. 差分隱私保護的隨機森林算法及在鋼材料上的應用[J]. 工程科學學報, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002
引用本文: 陳薛輝, 馮燕, 錢權. 差分隱私保護的隨機森林算法及在鋼材料上的應用[J]. 工程科學學報, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002
CHEN Xue-hui, FENG Yan, QIAN Quan. Differential privacy protection random forest algorithm and its application in steel materials[J]. Chinese Journal of Engineering, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002
Citation: CHEN Xue-hui, FENG Yan, QIAN Quan. Differential privacy protection random forest algorithm and its application in steel materials[J]. Chinese Journal of Engineering, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002

差分隱私保護的隨機森林算法及在鋼材料上的應用

doi: 10.13374/j.issn2095-9389.2022.05.29.002
基金項目: 國家重點研發計劃資助項目(2018YFB0704400);云南省重大科技專項資助項目(202102AB080019-3,202002AB080001-2);之江實驗室科研攻關資助項目(2021PE0AC02);上海張江國家自主創新示范區專項發展資金重大項目(ZJ2021-ZD-006)
詳細信息
    通訊作者:

    E-mail: qqian@shu.edu.cn

  • 中圖分類號: TG391

Differential privacy protection random forest algorithm and its application in steel materials

More Information
  • 摘要: 基于數據驅動的材料信息學被認為是材料研發第四范式,可以極大降低新材料的研發成本,縮短研發周期。然而,數據驅動的方法在材料數據共享利用時,會增加材料研發中關鍵工藝等敏感信息的隱私泄露風險。因此,面向隱私保護的機器學習是材料信息學中的關鍵問題。基于此,本文針對在材料信息學領域廣泛使用的隨機森林模型,提出了一種差分隱私保護的隨機森林算法。算法將整體隱私預算分配到每棵樹上,在建決策樹過程中引入差分隱私的拉普拉斯機制和指數機制,即在決策樹的分裂過程中采用指數機制隨機選擇分裂特征,同時采用拉普拉斯機制對節點數量添加噪聲,實現對隨機森林算法的差分隱私保護。本文結合鋼材料疲勞性能預測實驗,驗證算法在數據分別采用集中式存儲和分布式存儲下的有效性。實驗結果表明,在添加差分隱私保護后,各目標性能的預測決定系數R2值均達到0.8以上,與普通隨機森林的結果相差很小。另外,在數據分布式存儲情況下,隨著隱私預算的增加,各目標性能的預測R2值隨之增加。同時,隨著最大樹深度的增加,算法整體的預測精度先增加后降低,當最大樹深度取5時,預測精度最好。綜合看來,本文算法在實現隨機森林的差分隱私保護前提下,仍能保持較高的預測精度,且數據在分散存儲的分布式網絡的環境中,可根據隱私預算等算法參數設置,實現隱私保護強度和預測精度的平衡,有廣泛的應用前景。

     

  • 圖  1  DPRF算法總體框架

    Figure  1.  Framework of the DPRF algorithm

    圖  2  ε=10.0、d=5時DPRF算法各目標特征真實值與預測值散點圖. (a)疲勞; (b)拉伸; (c)斷裂; (d)硬度

    Figure  2.  Scatter diagrams of the real and predictive values of each target of the DPRF algorithm, whereby ε=10.0, d=5: (a) fatigue; (b) tensile; (c) fracture; (d) hardness

    圖  3  DPRF算法在不同隱私預算(a)和不同最大樹深度下(b)各目標性能的預測結果

    Figure  3.  Predive results of each target property of DPRF algorithms under different privacy budgets (a) and tree depths (b)

    表  1  差分隱私保護的樹模型算法對比分析

    Table  1.   Comparative analysis among different differential privacy preserving tree model algorithms

    AlgorithmBasic modelRealization mechanismTaskData storage
    SuLQ-based ID3Decision treeLaplaceClassificationCentralization
    DiffP-ID3Decision treeLaplace & ExponentialClassificationCentralization
    DiffP-C4.5Decision treeLaplace & ExponentialClassificationCentralization
    DiffPRFRandom forestLaplace & ExponentialClassificationCentralization
    DiffPRFsRandom forestLaplace & ExponentialClassificationCentralization
    DPRFRandom forestLaplace & ExponentialRegressionCentralization & distribution
    下載: 導出CSV
    算法1 基于差分隱私保護的DPRF算法
    輸入:訓練數據集D,特征集合F,隱私預算B,決策樹數量T,決策樹最大深度d,樹分裂時隨機特征個數m,數據分布情況下節點數N
    輸出:滿足ε-差分隱私的隨機森林;
    停止條件 :隨機森林建立的決策樹數量達到T或隱私預算耗盡;
    Procedure DPRF_fit (D,F,B,T,d,m)
    1: Forest={};
    2: 將整體的隱私預算平均分給每棵樹,每棵決策樹分配到的隱私預算$ \varepsilon ' = B/T $;
    3: for i=1 to T; //循環建立T棵樹
    4:  在數據集D中有放回采樣得到數據子集Dt,從特征集合F中隨機選擇m個特征;
    5:  將決策樹獲得的隱私預算分配到每一層,再將每一層的隱私預算分為$\varepsilon '' = \dfrac{ { {\varepsilon '} } }{ {d + 1} }$;
    6:   ε=ε''/2;
    7:  Treei=BuildTree(Dt,m,ε,d,0); //下述為建樹過程
    8:   if 當前節點滿足樹停止建立條件設置當前節點為葉子節點,葉子節點取值為葉子節點所有樣本的目標值的均值,|NDt|=|NDt|+Laplace(1/ε),返回葉子節點;
    9:  else
    10:   for each_feature in m
    11:    以當前特征中的值劃分左右數據集,記錄劃分時平均絕對誤差MAE最小的值為當前特征的split_value;
    12:    當前特征以split_value劃分數據集,計算該特征分數$\text{ex}\mathrm{p}\left(\dfrac{\varepsilon }{2\mathrm{\Delta }q}q\left({D}_{\mathrm{C} },f\right)\right)$;
    13:   計算m個特征的特征分數總分,任意特征f被選中為當前節點的分裂特征的概率滿足:$\dfrac{\mathrm{exp}(\dfrac{\varepsilon }{2\mathrm{\Delta }q}q({D}_{\mathrm{c} },f))}{ {\sum }_{1}^{m}\mathrm{exp}(\frac{\varepsilon }{2\mathrm{\Delta }q}q({D}_{\mathrm{c} },f))}$, 其中$ q({D}_{\mathrm{C}},f) $為可用性函數,$ \Delta q $為敏感度;
    14:   根據選出特征f的split_value,劃分左右數據集,并在左右數據集上繼續建樹;
    15:  Forest=Forset∪Treei;
    16: end for
    17: return Forest
    Procedure predict (Forest, Dtest)
    1: Result={};
    2: for d in Dtest
    3:  sum_predict=0;
    4:  for tree in Forest
    5:   遍歷當前樹,到達葉子節點,得到預測值predict_value;
    6:   sum_predict+=predict_value;
    7:  res=sum_predict/length(Forest);
    8: Result=Result∪res;
    9: return Result
    Procedure Distributed_fit (F,B,T,d,m)
    1: Forest_Distributed ={};
    2: 將整體的隱私預算平均分給個節點,每個節點分配到的隱私預算E=B/N;
    3: for i=1 to n
    4:  設節點i的數據集為Di;
    5:  foresti=DPRF_fit (Di,F,E,T,d,m);
    6:  Forest_Distribute = Forest_Distributed∪foresti;
    7: return Forest_Distributed
    Procedure Distributed_Predict(D, Forest_Distribute)
    1: Result=0;
    2: for i=1 to n
    3:  r=predict(Forest_Distributei,D);
    4:  Result+=r;
    5: Result=Result/n;
    6: return Result
    下載: 導出CSV

    表  2  NIMS鋼疲勞數據集具體特征信息

    Table  2.   Descriptor information of the NIMS dataset

    FeatureDescriptionMinimum valueMaximum valueMean valueStandard deviation
    NTNormalizing temperature825900865.617.37
    QTHardening temperature825865846.29.86
    TTTempering temperature55068060542.4
    CCarbon content0.280.570.4070.061
    SiSilicon content0.160.350.2580.034
    MnManganese content0.371.30.8490.294
    PPhosphorus content0.0070.0310.0160.005
    SSulfur content0.0030.030.0140.006
    NiNickel content0.012.780.5480.899
    CrChromium content0.011.120.5560.419
    CuCopper content0.010.220.0640.045
    MoMolybdenum content00.240.0660.089
    RRReduction ratio4205530971.2601.4
    dAPlastic inclusion00.130.0470.032
    dBDiscontinuous inclusions00.050.0030.009
    dCIsolated inclusion00.040.0080.01
    下載: 導出CSV

    表  3  隨機森林與差分隱私保護隨機森林預測結果

    Table  3.   Predictive results of target properties with random forest and DPRF

    Model and privacy
    budget
    R2
    FatigueTensileFractureHardness
    RF0.90590.92820.92520.9193
    ε=0.1 DPRF0.65880.64690.75880.6565
    ε=0.25 DPRF0.69300.69060.77210.7008
    ε=0.5 DPRF0.77040.76050.79180.7593
    ε=1.0 DPRF0.80350.81050.82190.8094
    ε=3.0 DPRF0.82490.82700.84610.8399
    ε=10.0 DPRF0.85270.84620.88520.8641
    下載: 導出CSV

    表  4  不同隱私預算下各目標性能的預測結果

    Table  4.   Predictive results of target properties under different privacy budgets

    εR2
    FatigueTensileFractureHardness
    0.30.61530.60300.69790.6139
    0.750.65630.67480.75850.6502
    1.50.70380.74480.80820.7308
    2.250.76150.77730.83770.7618
    3.00.79810.80250.84910.8017
    9.00.81300.83800.86770.8429
    下載: 導出CSV

    表  5  不同樹深度下各目標性能的預測結果

    Table  5.   Predictive results of each target property under different tree depths

    dR2
    FatigueTensileFractureHardness
    30.60270.61130.67960.6387
    40.70880.70610.79510.7183
    50.79610.80250.84910.8017
    60.75600.76050.85680.7659
    70.69200.74270.82510.7303
    下載: 導出CSV
    <span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    <span id="fpn9h"><noframes id="fpn9h">
    <th id="fpn9h"></th>
    <strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
    <th id="fpn9h"><noframes id="fpn9h">
    <span id="fpn9h"><video id="fpn9h"></video></span>
    <ruby id="fpn9h"></ruby>
    <strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    www.77susu.com
  • [1] Zhou S G, Li F, Tao Y F, et al. Privacy preservation in database applications: A survey. Chin J Comput, 2009, 32(5): 847 doi: 10.3724/SP.J.1016.2009.00847

    周水庚, 李豐, 陶宇飛, 等. 面向數據庫應用的隱私保護研究綜述. 計算機學報, 2009, 32(5):847 doi: 10.3724/SP.J.1016.2009.00847
    [2] Sweeney L. k-anonymity: A model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst, 2002, 10(5): 557 doi: 10.1142/S0218488502001648
    [3] Du W L, Atallah M J. Secure multi-party computation problems and their applications: A review and open problems//Proceedings of the 2001 Workshop on New Security Paradigms. Cloudcroft, 2001: 13
    [4] Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency [J/OL]. ArXiv Preprint (2017-10-30) [2022-5-29]. https://arxiv.org/abs/1610.05492
    [5] Dwork C. Differential privacy//Proceedings of the 33rd International Conference on Automata, Languages and Programming. New York, 2006: 1
    [6] Xiong J, Zhang T Y, Shi S Q. Machine learning of mechanical properties of steels. Sci China Technol Sci, 2020, 63(7): 1247 doi: 10.1007/s11431-020-1599-5
    [7] Dai M Y, Hu J M. Field-free spin-orbit torque perpendicular magnetization switching in ultrathin nanostructures. Npj Comput Mater, 2020, 6: 78 doi: 10.1038/s41524-020-0347-0
    [8] Huber L, Hadian R, Grabowski B, et al. A machine learning approach to model solute grain boundary segregation. Npj Comput Mater, 2018, 4: 64 doi: 10.1038/s41524-018-0122-7
    [9] Choudhary K, Garrity K F, Sharma V, et al. High-throughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. Npj Comput Mater, 2020, 6: 64 doi: 10.1038/s41524-020-0337-2
    [10] Bartel C J, Trewartha A, Wang Q, et al. A critical examination of compound stability predictions from machine-learned formation energies. Npj Comput Mater, 2020, 6: 97 doi: 10.1038/s41524-020-00362-y
    [11] Tang S L, Meng Y, Wang G Q, et al. Extraction of metamorphic minerals by multiscale segmentation combined with random forest. Chin J Eng, 2022, 44(2): 170 doi: 10.3321/j.issn.1001-053X.2022.2.bjkjdxxb202202002

    唐淑蘭, 孟勇, 王國強, 等. 結合多尺度分割和隨機森林的變質礦物提取. 工程科學學報, 2022, 44(2):170 doi: 10.3321/j.issn.1001-053X.2022.2.bjkjdxxb202202002
    [12] Chen L, Fu D M. Processing and modeling dual-rate sampled data in seawater corrosion monitoring of low alloy steels. Chin J Eng, 2022, 44(1): 95 doi: 10.3321/j.issn.1001-053X.2022.1.bjkjdxxb202201009

    陳亮, 付冬梅. 低合金鋼海水腐蝕監測中的雙率數據處理與建模. 工程科學學報, 2022, 44(1):95 doi: 10.3321/j.issn.1001-053X.2022.1.bjkjdxxb202201009
    [13] Sigmund G, Gharasoo M, Hüffer T, et al. Deep learning neural network approach for predicting the sorption of ionizable and polar organic pollutants to a wide range of carbonaceous materials. Environ Sci Technol, 2020, 54(7): 4583 doi: 10.1021/acs.est.9b06287
    [14] Le T D, Noumeir R, Quach H L, et al. Critical temperature prediction for a superconductor: A variational Bayesian neural network approach. IEEE Trans Appl Supercond, 2020, 30(4): 1
    [15] Wei M, Wang Q, Ye M, et al. An indirect remaining useful life prediction of lithium-ion batteries based on a NARX dynamic neural network. Chin J Eng, 2022, 44(3): 380 doi: 10.3321/j.issn.1001-053X.2022.3.bjkjdxxb202203007

    魏孟, 王橋, 葉敏, 等. 基于NARX動態神經網絡的鋰離子電池剩余壽命間接預測. 工程科學學報, 2022, 44(3):380 doi: 10.3321/j.issn.1001-053X.2022.3.bjkjdxxb202203007
    [16] De Cock M, Dowsley R, Horst C, et al. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Trans Dependable Secure Comput, 2019, 16(2): 217 doi: 10.1109/TDSC.2017.2679189
    [17] Wu Y C, Cai S F, Xiao X K, et al. Privacy preserving vertical federated learning for tree-based models [J/OL]. ArXiv Preprint (2020-08-14) [2020-05-29]. https://arxiv.org/abs/2008.06170
    [18] Liu Y, Liu Y T, Liu Z J, et al. Federated forest. IEEE Trans Big Data, 2022, 8(3): 843 doi: 10.1109/TBDATA.2020.2992755
    [19] Cheng K W, Fan T, Jin Y L, et al. SecureBoost: A lossless federated learning framework. IEEE Intell Syst, 2021, 36(6): 87 doi: 10.1109/MIS.2021.3082561
    [20] Blum A, Dwork C, McSherry F, et al. Practical privacy: The SuLQ framework//Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Baltimore, 2005: 128
    [21] Friedman A, Schuster A. Data mining with differential privacy//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, 2010: 493
    [22] Patil A, Singh S. Differential private random forest//2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). Delhi, 2014: 2623
    [23] Mu H R, Ding L P, Song Y N, et al. DiffPRFs: Random forest under differential privacy. J Commun, 2016, 37(9): 175 doi: 10.11959/j.issn.1000-436x.2016169

    穆海蓉, 丁麗萍, 宋宇寧, 等. DiffPRFs: 一種面向隨機森林的差分隱私保護算法. 通信學報, 2016, 37(9):175 doi: 10.11959/j.issn.1000-436x.2016169
    [24] Breiman L. Random forests. Mach Learn, 2001, 45(1): 5 doi: 10.1023/A:1010933404324
    [25] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. J Priv Confidentiality, 2017, 7(3): 17 doi: 10.29012/jpc.v7i3.405
    [26] McSherry F, Talwar K. Mechanism design via differential privacy//48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07). Providence, 2007: 94
    [27] Kairouz P, Oh S, Viswanath P. The composition theorem for differential privacy. IEEE Trans Inf Theory, 2017, 63(6): 4037 doi: 10.1109/TIT.2017.2685505
    [28] Agrawal A, Choudhary A. An online tool for predicting fatigue strength of steel alloys based on ensemble data mining. Int J Fatigue, 2018, 113: 389 doi: 10.1016/j.ijfatigue.2018.04.017
  • 加載中
圖(3) / 表(6)
計量
  • 文章訪問數:  434
  • HTML全文瀏覽量:  144
  • PDF下載量:  58
  • 被引次數: 0
出版歷程
  • 收稿日期:  2022-05-29
  • 網絡出版日期:  2022-07-27
  • 刊出日期:  2023-07-25

目錄

    /

    返回文章
    返回