-
摘要: 隨著萬物互聯時代的快速到來,海量的數據資源在邊緣側產生,使得基于云計算的傳統分布式訓練面臨網絡負載大、能耗高、隱私安全等問題。在此背景下,邊緣智能應運而生。邊緣智能協同訓練作為關鍵環節,在邊緣側輔助或實現機器學習模型的分布式訓練,成為邊緣智能研究的一大熱點。然而,邊緣智能需要協調大量的邊緣節點進行機器模型的訓練,在邊緣場景中存在諸多挑戰。因此,通過充分調研現有邊緣智能協同訓練研究基礎,從整體架構和核心模塊兩方面總結現有的關鍵技術,圍繞邊緣智能協同訓練在設備異構、設備資源受限和網絡環境不穩定等邊緣場景下進行訓練的挑戰及解決方案;從邊緣智能協同訓練的整體架構和核心模塊兩大方面進行介紹與總結,關注邊緣設備之間的交互框架和大量邊緣設備協同訓練神經網絡模型參數更新問題。最后分析和總結了邊緣協同訓練存在的諸多挑戰和未來展望。Abstract: With the rapid arrival of the Internet of Everything era, massive data resources are generated on edge sides, causing problems such as large network load, high energy consumption, and privacy security in traditional distributed training based on cloud computing. Edge computing sinks computing power resources to the edge side, forming a collaborative computing system that integrates “cloud, edge, and end,” which can meet the basic needs of real-time operations, intelligence, security, and privacy protection. With the help of edge computing capabilities, edge intelligence effectively promotes the intelligent development of the edge side, which has become a popular topic. Through our research, we found that edge collaborative intelligence is currently in a stage of rapid development. At this stage, several deep learning models are combined with edge computing, and many edge collaborative intelligent processing solutions have exploded, such as distributed training in edge computing scenarios, federated learning, and distributed collaborative reasoning based on technologies such as model cutting and early exit. The combination of a shallow breadth learning system and virtualization technology allows for quick implementation of edge intelligence, which considerably improves service quality and user experience and makes services more intelligent. As a key link of edge intelligence, edge intelligence collaborative training aims to assist or implement the distributed training of machine learning models on the edge side. However, in an edge computing scenario, the distributed training of the model must coordinate several edge nodes, and many challenges remain. Therefore, by fully investigating the existing research foundation of edge intelligent collaborative training, we focus on the challenges and solutions of edge intelligent collaborative training in edge scenarios such as equipment heterogeneity, limited equipment resources, and unstable network environments. This paper introduces and summarizes the overall architecture and core modules of edge intelligent collaborative training. The overall architecture mainly focuses on the interaction framework between edge devices. In terms of whether there is a central server role, it can be divided into two categories: parameter server centralized architecture and fully decentralized parallel architecture. The core module of edge intelligent collaborative training mainly focuses on the problem of collaborative training of a large number of edge devices for neural network models to update parameters. In terms of the role of parallel computing in model training, it is divided into data parallelism and model parallelism. Finally, the many challenges and prospects of edge collaborative training are analyzed and summarized.
-
Key words:
- cloud computing /
- edge intelligence /
- collaborative training /
- edge computing /
- machine learning /
- distributed training
-
表 1 參數服務器集中式架構相關工作
Table 1. Related works of a parameter server with centralized architecture
Communication mechanism Optimization level Research questions Optimization objective Reference Synchronization Equipment level Limited resources Improve local model quality [41] Communication level Limited resources Reduce traffic [56] Equipment level Heterogeneous equipment Shorten communication time [58–59] Equipment level Comprehensive consideration Architecture flexibility [60] Communication level Unstable environment Architecture robustness [56] Asynchronization Equipment level Stale gradient Architecture flexibility [62–64] Communication level Comprehensive consideration Trade optimization [65] Equipment level Dynamic client Time consuming optimization [66] Overall architecture Heterogeneous equipment Architecture robustness [67] 表 2 分散并行式架構(D-PSGD)相關工作
Table 2. Related works of dispersed parallel stochastic gradient descent
Research questions Research protocol Optimization objective Reference Neighbor interaction Random selection Reduce interaction complexity [69–72] Cooperation by batch rotation Improve model consistency [73] Look for similar targets Best communication partner [74] Single trust set Improve architecture robustness [75] Weight comparison selection Best communication partner [76] Communication consumption Asymmetric interaction Avoid redundant communication [77] Overlapping communication computation Avoid redundant communication and computing [78] Model compression Make full use of link resources [79] Model cutting Improve communication flexibility [80] 表 3 數據并行相關工作
Table 3. Related works of data parallel
Parameter update method Main problems Solution Reference Synchronize updates Client delay Client filtering [58–59] Client selection [83] Hybrid update [84] Partial update of model [85] Asynchronous update Obsolescence effect Astringency [61,86] Penalty old gradient [90,92] Adjust learning rate [62,91] Use momentum [94] Adjust super parameters [95] 表 4 數據非獨立同分布問題相關工作
Table 4. Related works of data non-independent and identical distribution issues
表 5 模型并行相關工作
Table 5. Related works of model parallelism
表 6 知識蒸餾相關工作
Table 6. Related works of data non-independent and identical distribution issues
www.77susu.com -
參考文獻
[1] Zhang X Z, Wang Y F, Lu S D, et al. OpenEI: an open framework for edge intelligence // 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). Dallas, 2019: 1840 [2] Wang R, Qi J P, Chen L, et al. Survey of collaborative inference for edge intelligence. J Comput Res Dev, 2023, 60(2): 398王睿, 齊建鵬, 陳亮, 等. 面向邊緣智能的協同推理綜述. 計算機研究與發展, 2023, 60(2): 398 [3] Zhou Z, Chen X, Li E, et al. Edge intelligence: Paving the last Mile of artificial intelligence with edge computing. Proc IEEE, 2019, 107(8): 1738 doi: 10.1109/JPROC.2019.2918951 [4] Li K L, Liu C B. Edge intelligence: State-of-the-art and expectations. Big Data Res, 2019, 5(3): 69李肯立, 劉楚波. 邊緣智能: 現狀和展望. 大數據, 2019, 5(3):69 [5] Tan H S, Guo D, Ke Z C, et al. Development and challenges of cloud edge collaborative intelligent edge computing. CCCF, 2020(1): 16談海生, 郭得科, 張弛, 等. 云邊端協同智能邊緣計算的發展與挑戰. 中國計算機協會通訊, 2020(1):16 [6] Zhang X Z, Lu S D, Shi W S. Research on collaborative computing technology in edge intelligence. AI-View, 2019, 6(5): 55張星洲, 魯思迪, 施巍松. 邊緣智能中的協同計算技術研究. 人工智能, 2019, 6(5):55 [7] Wang X F. Intelligent edge computing: From internet of everything to internet of everything empowered. Frontiers, 2020(9): 6王曉飛. 智慧邊緣計算: 萬物互聯到萬物賦能的橋梁. 人民論壇·學術前沿, 2020(9):6 [8] Fang A D, Cui L, Zhang Z W, et al. A parallel computing framework for cloud services // 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). Dalian, 2020: 832 [9] Lanka S, Aung Win T, Eshan S. A review on Edge computing and 5G in IOT: Architecture & Applications // 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2021: 532 [10] Carrie M, David R, Michael S. The growth in connected IoT devices is expected to generate 79.4ZB of data in 2025, according to a new IDC forecast. (2019-06-18) [2022-09-26]. https://www.businesswire.com/news/home/20190618005012 [11] Zwolenski M, Weatherill L. The digital universe rich data and the increasing value of the internet of things. J Telecommun Digital Economy, 2014, 2(3): 47.1 [12] Jin H, Jia L, Zhou Z. Boosting edge intelligence with collaborative cross-edge analytics. IEEE Internet Things J, 2021, 8(4): 2444 doi: 10.1109/JIOT.2020.3034891 [13] Jiang X L, Shokri-Ghadikolaei H, Fodor G, et al. Low-latency networking: Where latency lurks and how to tame it. Proc IEEE, 2019, 107(2): 280 doi: 10.1109/JPROC.2018.2863960 [14] Xiao Y H, Jia Y Z, Liu C C, et al. Edge computing security: State of the art and challenges. Proc IEEE, 2019, 107(8): 1608 doi: 10.1109/JPROC.2019.2918437 [15] Huang T, Liu J, Wang S, et al. Survey of the future network technology and trend. J Commun, 2021, 42(1): 130黃韜, 劉江, 汪碩, 等. 未來網絡技術與發展趨勢綜述. 通信學報, 2021, 42(1):130 [16] Jennings A, Copenhagen van R, Rusmin T. Aspects of Network Edge Intelligence. Maluku Technical Report, 2001 [17] Song C H, Zeng P, Yu H B. Industrial Internet intelligent manufacturing edge computing: State-of-the-art and challenges. ZTE Technol J, 2019, 25(3): 50宋純賀, 曾鵬, 于海斌. 工業互聯網智能制造邊緣計算: 現狀與挑戰. 中興通訊技術, 2019, 25(3):50 [18] Risteska Stojkoska B L, Trivodaliev K V. A review of Internet of Things for smart home: Challenges and solutions. J Clean Prod, 2017, 140: 1454 doi: 10.1016/j.jclepro.2016.10.006 [19] Varghese B, Wang N, Barbhuiya S, et al. Challenges and opportunities in edge computing // 2016 IEEE International Conference on Smart Cloud (SmartCloud). New York, 2016: 20 [20] Shi W S, Zhang X Z, Wang Y F, et al. Edge computing: State-of-the-art and future directions. J Comput Res Dev, 2019, 56(1): 69施巍松, 張星洲, 王一帆, 等. 邊緣計算: 現狀與展望. 計算機研究與發展, 2019, 56(1):69 [21] Teerapittayanon S, McDanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices // 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). Atlanta, 2017: 328 [22] Wang X F, Han Y W, Wang C Y, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Netw, 2019, 33(5): 156 doi: 10.1109/MNET.2019.1800286 [23] Kang Y P, Hauswald J, Gao C, et al. Neurosurgeon. SIGOPS Oper Syst Rev, 2017, 51(2): 615 doi: 10.1145/3093315.3037698 [24] Li E, Zhou Z, Chen X. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy // Proceedings of the 2018 Workshop on Mobile Edge Communications. Budapest, 2018: 31 [25] Li Y K, Zhang T, Chen J L. Broad Siamese network for edge computing applications. Acta Autom Sin, 2020, 46(10): 2060李逸楷, 張通, 陳俊龍. 面向邊緣計算應用的寬度孿生網絡. 自動化學報, 2020, 46(10):2060 [26] Al-Rakhami M, Alsahli M, Hassan M M, et al. Cost efficient edge intelligence framework using docker containers // 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). Athens, 2018: 800 [27] Al-Rakhami M, Gumaei A, Alsahli M, et al. A lightweight and cost effective edge intelligence architecture based on containerization technology. World Wide Web, 2020, 23(2): 1341 doi: 10.1007/s11280-019-00692-y [28] Zaharia M, Xin R S, Wendell P, et al. Apache spark. Commun ACM, 2016, 59(11): 56 doi: 10.1145/2934664 [29] Abadi M, Barham P, Chen J M, et al. TensorFlow: A system for large-scale machine learning [J/OL]. ArXiv Preprint (2016-05-31) [2022-09-26]. https://arxiv.org/abs/1605.08695 [30] Chen T Q, Li M, Li Y T, et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems [J/OL]. ArXiv Preprint (2015-12-03) [2022-09-26]. https://arxiv.org/abs/1512.01274 [31] Jin A L, Xu W C, Guo S, et al. PS: A simple yet effective framework for fast training on parameter server. IEEE Trans Parallel Distributed Syst, 2022, 33(12): 4625 doi: 10.1109/TPDS.2022.3200518 [32] Padmanandam K, Lingutla L. Practice of applied edge analytics in intelligent learning framework // 2020 21st International Arab Conference on Information Technology (ACIT). Giza, 2021: 1 [33] Ross P, Luckow A. EdgeInsight: characterizing and modeling the performance of machine learning inference on the edge and cloud // 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, 2020: 1897 [34] Shi W S, Sun H, Cao J, et al. Edge computing–An emerging computing model for the Internet of everything era. J Comput Res Dev, 2017, 54(5): 907施巍松, 孫輝, 曹杰, 等. 邊緣計算: 萬物互聯時代新型計算模型. 計算機研究與發展, 2017, 54(5):907 [35] Srivastava A, Nguyen D, Aggarwal S, et al. Performance and memory trade-offs of deep learning object detection in fast streaming high-definition images // 2018 IEEE International Conference on Big Data (Big Data). Seattle, 2018: 3915 [36] Sindhu C, Vyas D V, Pradyoth K. Sentiment analysis based product rating using textual reviews // 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2017: 727 [37] Hosein P, Rahaman I, Nichols K, et al. Recommendations for long-term profit optimization // Proceedings of ImpactRS@ RecSys. Copenhagen, 2019 [38] Sharma R, Biookaghazadeh S, Li B X, et al. Are existing knowledge transfer techniques effective for deep learning with edge devices? // 2018 IEEE International Conference on Edge Computing (EDGE). San Francisco, 2018: 42 [39] Bonawitz K, Eichner H, Grieskamp W, et al. Towards federated learning at scale: System design // Proceedings of Machine Learning and Systems. Palo Alto, 2019, 1: 374 [40] Kairouz P, McMahan H B, Avent B, et al. Advances and open problems in federated learning. FNT Machine Learning, 2021, 14(1-2): 1 [41] McMahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [J/OL]. ArXiv Preprint (2017-02-28) [2022-09-26]. https://arxiv.org/abs/1602.05629 [42] Zhu J M, Zhang Q N, Gao S, et al. Privacy preserving and trustworthy federated learning model based on blockchain. Chin J Comput, 2021, 44(12): 2464朱建明, 張沁楠, 高勝, 等. 基于區塊鏈的隱私保護可信聯邦學習模型. 計算機學報, 2021, 44(12):2464 [43] Wei S Y, Tong Y X, Zhou Z M, et al. Efficient and Fair Data Valuation for Horizontal Federated Learning. Berlin: Springer, 2020 [44] Khan A, Thij M, Wilbik A. Communication-efficient vertical federated learning. Algorithms, 2022, 15(8): 273 doi: 10.3390/a15080273 [45] Chen Y Q, Qin X, Wang J D, et al. FedHealth: A federated transfer learning framework for wearable healthcare. IEEE Intell Syst, 2020, 35(4): 83 doi: 10.1109/MIS.2020.2988604 [46] Yang J, Zheng J, Zhang Z, et al. Security of federated learning for cloud-edge intelligence collaborative computing. Int J Intell Syst, 2022, 37(11): 9290 doi: 10.1002/int.22992 [47] Zhang X J, Gu H L, Fan L X, et al. No free lunch theorem for security and utility in federated learning [J/OL]. ArXiv Preprint (2022-09-05) [2022-09-26].https://arxiv.org/abs/2203.05816 [48] Deng S G, Zhao H L, Fang W J, et al. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J, 2020, 7(8): 7457 doi: 10.1109/JIOT.2020.2984887 [49] Feng C, Han P C, Zhang X, et al. Computation offloading in mobile edge computing networks: A survey. J Netw Comput Appl, 2022, 202: 103366 doi: 10.1016/j.jnca.2022.103366 [50] Qiao D W, Guo S T, He J, et al. Edge intelligence: Research progress and challenges. Radio Commun Technol, 2022, 48(1): 34喬德文, 郭松濤, 何靜, 等. 邊緣智能: 研究進展及挑戰. 無線電通信技術, 2022, 48(1):34 [51] Fortino G, Zhou M C, Hassan M M, et al. Pushing artificial intelligence to the edge: Emerging trends, issues and challenges. Eng Appl Artif Intell, 2021, 103: 104298 doi: 10.1016/j.engappai.2021.104298 [52] Qiu X C, Fernández-Marqués J, Gusmão P, et al. ZeroFL: Efficient on-device training for federated learning with local sparsity [J/OL]. ArXiv Preprint (2022-08-04) [2022-09-26]. https://arxiv.org/abs/2208.02507 [53] Long S Q, Long W F, Li Z T, et al. A game-based approach for cost-aware task assignment with QoS constraint in collaborative edge and cloud environments. IEEE Trans Parallel Distributed Syst, 2021, 32(7): 1629 doi: 10.1109/TPDS.2020.3041029 [54] Zhu H R, Yuan G J, Yao C J, et al. Survey on network of distributed deep learning training. J Comput Res Dev, 2021, 58(1): 98 doi: 10.7544/issn1000-1239.2021.20190881朱泓睿, 元國軍, 姚成吉, 等. 分布式深度學習訓練網絡綜述. 計算機研究與發展, 2021, 58(1):98 doi: 10.7544/issn1000-1239.2021.20190881 [55] Rafique Z, Khalid H M, Muyeen S M. Communication systems in distributed generation: A bibliographical review and frameworks. IEEE Access, 2020, 8: 207226 doi: 10.1109/ACCESS.2020.3037196 [56] Hsieh K, Harlap A, Vijaykumar N, et al. Gaia: Geo-distributed machine learning approaching LAN speeds // Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. New York, 2017: 629 [57] Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency [J/OL]. ArXiv Preprint (2017-10-30) [2022-09-26]. https://arxiv.org/abs/1610.05492 [58] Chen J M, Pan X H, Monga R, et al. Revisiting distributed synchronous SGD [J/OL]. ArXiv Preprint (2017-03-21) [2022-09-26]. https://arxiv.org/abs/1604.00981 [59] Nishio T, Yonetani R. Client selection for federated learning with heterogeneous resources in mobile edge // ICC 2019–2019 IEEE International Conference on Communications (ICC). Shanghai, 2019: 1 [60] Wang S Q, Tuor T, Salonidis T, et al. When edge meets learning: Adaptive control for resource-constrained distributed machine learning // IEEE INFOCOM 2018-IEEE Conference on Computer Communications. Honolulu, 2018: 63 [61] Lian X R, Huang Y J, Li Y C, et al. Asynchronous parallel stochastic gradient for nonconvex optimization // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 2737 [62] Zhang W, Gupta S, Lian X R, et al. Staleness-aware async-SGD for distributed deep learning [J/OL]. ArXiv Preprint (2014-04-05) [2022-09-26]. https://arxiv.org/abs/1511.05950 [63] Lu X F, Liao Y Y, Lio P, et al. Privacy-preserving asynchronous federated learning mechanism for edge network computing. IEEE Access, 2020, 8: 48970 doi: 10.1109/ACCESS.2020.2978082 [64] Chen Y J, Ning Y, Slawski M, et al. Asynchronous online federated learning for edge devices with non-IID data // 2020 IEEE International Conference on Big Data (Big Data). Atlanta, 2021: 15 [65] Dutta S, Wang J Y, Joshi G. Slow and stale gradients can win the race. IEEE J Sel Areas Inf Theory, 2021, 2(3): 1012 doi: 10.1109/JSAIT.2021.3103770 [66] Lu Y L, Huang X H, Zhang K, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles. IEEE Trans Veh Technol, 2020, 69(4): 4298 doi: 10.1109/TVT.2020.2973651 [67] Wu W T, He L G, Lin W W, et al. SAFA: A semi-asynchronous protocol for fast federated learning with low overhead. IEEE Trans Comput, 2021, 70(5): 655 doi: 10.1109/TC.2020.2994391 [68] Luehr N. Fast multi-GPU collectives with NCCL [J/OL]. NVIDIA Developer (2016-04-07) [2022-09-26]. https://developer.nvidia.com/blog/fast-multi-gpu-collectives-nccl [69] Lian X R, Zhang W, Zhang C, et al. Asynchronous decentralized parallel stochastic gradient descent [J/OL]. ArXiv Preprint (2018-09-25) [2022-09-26]. https://arxiv.org/abs/1710.06952 [70] Lalitha A, Kilinc O C, Javidi T, et al. Peer-to-peer federated learning on graphs [J/OL]. ArXiv Preprint (2019-01-31) [2022-09-26].https://arxiv.org/abs/1901.11173 [71] Blot M, Picard D, Cord M, et al. Gossip training for deep learning [J/OL]. ArXiv Preprint (2016-11-29) [2022-09-26]. https://arxiv.org/abs/1611.09726 [72] Jin P H, Yuan Q C, Iandola F, et al. How to scale distributed deep learning? [J/OL]. ArXiv Preprint (2016-11-14) [2022-09-26]. https://arxiv.org/abs/1611.04581 [73] Daily J, Vishnu A, Siegel C, et al. GossipGraD: Scalable Deep Learning using Gossip Communication based asynchronous gradient descent [J/OL]. ArXiv Preprint (2018-03-15) [2022-09-26]. https://arxiv.org/abs/1803.05880 [74] Vanhaesebrouck P, Bellet A, Tommasi M. Decentralized collaborative learning of personalized models over networks // Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Florida, 2017: 509 [75] He C Y, Tan C H, Tang H L, et al. Central server free federated learning over single-sided trust social networks [J/OL]. ArXiv Preprint (2020-08-01) [2022-09-26]. https://arxiv.org/abs/1910.04956 [76] Colin I, Bellet A, Salmon J, et al. Gossip dual averaging for decentralized optimization of pairwise functions[J/OL]. ArXiv Preprint (2016-06-08) [2022-09-26]. https://arxiv.org/abs/1606.02421 [77] Nedi? A, Olshevsky A. Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control, 2016, 61(12): 3936 doi: 10.1109/TAC.2016.2529285 [78] Assran M, Loizou N, Ballas N, et al. Stochastic gradient push for distributed deep learning // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 344 [79] Koloskova A, Stich S, Jaggi M. Decentralized stochastic optimization and gossip algorithms with compressed communication // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 3478 [80] Hu C H, Jiang J Y, Wang Z. Decentralized federated learning: A segmented gossip approach [J/OL]. ArXiv Preprint (2019-08-21) [2022-09-26]. https://arxiv.org/abs/1908.07782 [81] Ruder S. An overview of gradient descent optimization algorithms [J/OL]. ArXiv Preprint (2017-06-15) [2022-09-26]. https://arxiv.org/abs/1609.04747 [82] Chahal K S, Grover M S, Dey K, et al. A hitchhiker’s guide on distributed training of deep neural networks. J Parallel Distributed Comput, 2020, 137: 65 doi: 10.1016/j.jpdc.2019.10.004 [83] Chai Z, Ali A, Zawad S, et al. TiFL: A tier-based federated learning system // Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. Stockholm, 2020: 125 [84] Li X Y, Qu Z, Tang B, et al. Stragglers are not disaster: A hybrid federated learning algorithm with delayed gradients[J/OL]. ArXiv Preprint (2021-02-12) [2022-09-26]. https://arxiv.org/abs/2102.06329 [85] Xu Z R, Yang Z, Xiong J J, et al. ELFISH: Resource-aware federated learning on heterogeneous edge devices[J/OL]. ArXiv Preprint (2021-03-01) [2022-09-26]. https://arxiv.org/abs/1912.01684 [86] Agarwal A, Duchi J C. Distributed delayed stochastic optimization // Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, 2011: 873 [87] Sahu A N, Dutta A, Tiwari A, et al. On the convergence analysis of asynchronous SGD for solving consistent linear systems [J/OL]. ArXiv Preprint (2020-04-05) [2022-09-26]. https://arxiv.org/abs/2004.02163 [88] Dean J, Corrado G S, Monga R, et al. Large scale distributed deep networks // Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, 2012: 1223 [89] Zhang S X, Choromanska A, LeCun Y. Deep learning with elastic averaging SGD // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 685 [90] Xie C, Koyejo S, Gupta I. Asynchronous federated optimization [J/OL]. ArXiv Preprint (2020-12-05) [2022-09-26]. https://arxiv.org/abs/1903.03934 [91] Odena A. Faster asynchronous SGD [J/OL]. ArXiv Preprint (2016-01-15) [2022-09-26]. https://arxiv.org/abs/1601.04033 [92] Chan W, Lane I. Distributed asynchronous optimization of convolutional neural networks // Proceedings of Fifteenth Annual Conference of the International Speech Communication Association. Singapore, 2014: 1073 [93] Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning // Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, 2013: 1139 [94] Hakimi I, Barkai S, Gabel M, et al. Taming momentum in a distributed asynchronous environment [J/OL]. ArXiv Preprint (2020-10-14) [2022-09-26]. https://arxiv.org/abs/1907.11612 [95] Chen M, Mao B C, Ma T Y. FedSA: A staleness-aware asynchronous federated learning algorithm with non-IID data. Future Gener Comput Syst, 2021, 120: 1 doi: 10.1016/j.future.2021.02.012 [96] Li X, Huang K X, Yang W H, et al. On the convergence of FedAvg on non-IID data [J/OL]. ArXiv Preprint (2020-06-25) [2022-09-26]. https://arxiv.org/abs/1907.02189 [97] Khaled A, Mishchenko K, Richtárik P. First analysis of local GD on heterogeneous data [J/OL]. ArXiv Preprint (2020-03-18) [2022-09-26]. https://arxiv.org/abs/1909.04715 [98] Hsu T M H, Qi H, Brown M. Measuring the effects of non-identical data distribution for federated visual classification [J/OL]. ArXiv Preprint (2019-09-13) [2022-09-26]. https://arxiv.org/abs/1909.06335 [99] Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for on-device federated learning [J/OL]. ArXiv Preprint (2021-04-09) [2022-09-26]. https://arxiv.org/abs/1910.06378 [100] Li T, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks [J/OL]. ArXiv Preprint (2020-04-21) [2022-09-26]. https://arxiv.org/abs/1812.06127 [101] Wang J Y, Liu Q H, Liang H, et al. Tackling the objective inconsistency problem in heterogeneous federated optimization [J/OL]. ArXiv Preprint (2020-07-15) [2022-09-26]. https://arxiv.org/abs/2007.07481 [102] Hsu T M H, Qi H, Brown M. Federated visual classification with real-world data distribution [J/OL]. ArXiv Preprint (2020-07-17) [2022-09-26]. https://arxiv.org/abs/2003.08082 [103] Zhao Y, Li M, Lai L Z, et al. Federated learning with non-IID data [J/OL]. ArXiv Preprint (2022-07-21) [2022-09-26]. https://arxiv.org/abs/1806.00582 [104] Yoshida N, Nishio T, Morikura M, et al. Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data // ICC 2020–2020 IEEE International Conference on Communications (ICC). Dublin, 2020: 1 [105] Shoham N, Avidor T, Keren A, et al. Overcoming forgetting in federated learning on non-IID data [J/OL]. ArXiv Preprint (2019-10-17) [2022-09-26]. https://arxiv.org/abs/1910.07796 [106] Huang Y T, Chu L Y, Zhou Z R, et al. Personalized cross-silo federated learning on non-IID data. Proc AAAI Conf Artif Intell, 2021, 35(9): 7865 [107] Wu Q, He K W, Chen X. Personalized federated learning for intelligent IoT applications: A cloud-edge based framework. IEEE Open J Comput Soc, 2020, 1: 35 doi: 10.1109/OJCS.2020.2993259 [108] Günther S, Ruthotto L, Schroder J B, et al. Layer-parallel training of deep residual neural networks [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1812.04352 [109] Mayer R, Jacobsen H-A. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Comput Surv, 2020, 53(1): 1 [110] Jia Z H, Zaharia M, Aiken A. Beyond data and model parallelism for deep neural networks [J/OL]. ArXiv Preprint (2018-07-14) [2022-09-26]. https://arxiv.org/abs/1807.05358 [111] Harlap A, Narayanan D, Phanishayee A, et al. PipeDream: Fast and efficient pipeline parallel DNN training [J/OL]. ArXiv Preprint (2018-06-08) [2022-09-26]. https://arxiv.org/abs/1806.03377 [112] Chen C C, Yang C L, Cheng H Y. Efficient and robust parallel DNN training through model parallelism on multi-GPU platform [J/OL]. ArXiv Preprint (2019-10-28) [2022-09-26]. https://arxiv.org/abs/1809.02839 [113] Huang Y P, Cheng Y L, Bapna A, et al. GPipe: Efficient training of giant neural networks using pipeline parallelism [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1811.06965 [114] Mirhoseini A, Pham H, Le Q V, et al. Device placement optimization with reinforcement learning // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 2430 [115] Shoeybi M, Patwary M, Puri R, et al. Megatron-LM: Training multi-billion parameter language models using model parallelism [J/OL]. ArXiv Preprint (2020-03-13) [2022-09-26]. https://arxiv.org/abs/1909.08053 [116] Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks [J/OL]. ArXiv Preprint (2019-03-04) [2022-09-26]. https://arxiv.org/abs/1803.03635 [117] Wang Z D, Liu X X, Huang L, et al. QSFM: Model pruning based on quantified similarity between feature maps for AI on edge. IEEE Internet Things J, 2022, 9(23): 24506 doi: 10.1109/JIOT.2022.3190873 [118] Wang J, Zhang J G, Bao W D, et al. Not just privacy: Improving performance of private deep learning in mobile cloud // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, 2018: 2407 [119] Zhang L F, Tan Z H, Song J B, et al. Scan: A scalable neural networks framework towards compact and efficient models // 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, 2019: 32 [120] Gou J P, Yu B S, Maybank S J, et al. Knowledge distillation: A survey [J/OL]. ArXiv Preprint (2021-03-20) [2022-09-26]. https://arxiv.org/abs/2006.05525 [121] Phuong M, Lampert C H. Towards understanding knowledge distillation [J/OL]. ArXiv Preprint (2021-03-27) [2022-09-26].https://arxiv.org/abs/2105.13093 [122] Anil R, Pereyra G, Passos A, et al. Large scale distributed neural network training through online distillation [J/OL]. ArXiv Preprint (2020-08-20) [2022-09-26]. https://arxiv.org/abs/1804.03235 [123] Jeong E, Oh S, Kim H, et al. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data [J/OL]. ArXiv Preprint (2018-11-28) [2022-09-26]. https://arxiv.org/abs/1811.11479 [124] Shen T, Zhang J, Jia X K, et al. Federated mutual learning [J/OL]. ArXiv Preprint (2020-09-17) [2022-09-26]. https://arxiv.org/abs/2006.16765 [125] Sattler F, Marban A, Rischke R, et al. Communication-efficient federated distillation [J/OL]. ArXiv Preprint (2020-12-01) [2022-09-26]. https://arxiv.org/abs/2012.00632 [126] Ahn J H, Simeone O, Kang J. Wireless federated distillation for distributed edge learning with heterogeneous data // 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC). Istanbul, 2019: 1 -