Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

dc.contributor.authorShi, Wenqien
dc.contributor.authorHou, Yunzhongen
dc.contributor.authorZhou, Shengen
dc.contributor.authorNiu, Zhishengen
dc.contributor.authorZhang, Yangen
dc.contributor.authorGeng, Luen
dc.date.accessioned2025-06-17T01:33:01Z
dc.date.available2025-06-17T01:33:01Z
dc.date.issued2019en
dc.description.abstractDeep neural networks (DNNs) are state-of-the-art solutions for many machine learning applications, and have been widely used on mobile devices. Running DNNs on resourceconstrained mobile devices often requires the help from edge servers via computation offloading. However, offloading through a bandwidth-limited wireless link is non-trivial due to the tight interplay between the computation resources on mobile devices and wireless resources. Existing studies have focused on cooperative inference where DNN models are partitioned at different neural network layers, and the two parts are executed at the mobile device and the edge server, respectively. Since the output data size of a DNN layer can be larger than that of the raw data, offloading intermediate data between layers can suffer from high transmission latency under limited wireless bandwidth. In this paper, we propose an efficient and flexible 2-step pruning framework for DNN partition between mobile devices and edge servers. In our framework, the DNN model only needs to be pruned once in the training phase where unimportant convolutional filters are removed iteratively. By limiting the pruning region, our framework can greatly reduce either the wireless transmission workload of the device or the total computation workload. A series of pruned models are generated in the training phase, from which the framework can automatically select to satisfy varying latency and accuracy requirements. Furthermore, coding for the intermediate data is added to provide extra transmission workload reduction. Our experiments show that the proposed framework can achieve up to 25.6X reduction on transmission workload, 6.01X acceleration on total computation and 4.81X reduction on end-to-end latency as compared to partitioning the original DNN model without pruning.en
dc.description.sponsorshipACKNOWLEDGEMENT This work is sponsored in part by the Nature Science Foundation of China (No. 61871254, No. 91638204, No. 61571265, No. 61861136003, No. 61621091), National Key R&D Program of China 2018YFB0105005, and Hitachi Ltd. This work is sponsored in part by the Nature Science Foundation of China (No. 61871254, No. 91638204, No. 61571265, No. 61861136003, No. 61621091), National Key R and D Program of China 2018YFB0105005, and Hitachi Ltd.en
dc.description.statusPeer-revieweden
dc.identifier.isbn9781728118789en
dc.identifier.otherScopus:85087276813en
dc.identifier.otherORCID:/0000-0002-6916-4789/work/171156687en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85087276813&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733763894
dc.language.isoenen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en
dc.relation.ispartofseries2019 INFOCOM IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2019en
dc.relation.ispartofseriesINFOCOM 2019 - IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2019en
dc.rightsPublisher Copyright: © 2019 IEEE.en
dc.titleImproving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruningen
dc.typeConference paperen
local.contributor.affiliationShi, Wenqi; Tsinghua Universityen
local.contributor.affiliationHou, Yunzhong; Department of Electronic Materials Engineering, Research School of Physics, ANU College of Science and Medicine, The Australian National Universityen
local.contributor.affiliationZhou, Sheng; Tsinghua Universityen
local.contributor.affiliationNiu, Zhisheng; Tsinghua Universityen
local.contributor.affiliationZhang, Yang; Hitachi, Ltd.en
local.contributor.affiliationGeng, Lu; Hitachi, Ltd.en
local.identifier.doi10.1109/INFOCOMWKSHPS47286.2019.9093772en
local.identifier.pure197a8b89-05c7-49da-b8d0-f70d4b3a6eb9en
local.type.statusPublisheden

Downloads