Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

Shi, Wenqi; Hou, Yunzhong; Zhou, Sheng; Niu, Zhisheng; Zhang, Yang; Geng, Lu

Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

dc.contributor.author	Shi, Wenqi	en
dc.contributor.author	Hou, Yunzhong	en
dc.contributor.author	Zhou, Sheng	en
dc.contributor.author	Niu, Zhisheng	en
dc.contributor.author	Zhang, Yang	en
dc.contributor.author	Geng, Lu	en
dc.date.accessioned	2025-06-17T01:33:01Z
dc.date.available	2025-06-17T01:33:01Z
dc.date.issued	2019	en
dc.description.abstract	Deep neural networks (DNNs) are state-of-the-art solutions for many machine learning applications, and have been widely used on mobile devices. Running DNNs on resourceconstrained mobile devices often requires the help from edge servers via computation offloading. However, offloading through a bandwidth-limited wireless link is non-trivial due to the tight interplay between the computation resources on mobile devices and wireless resources. Existing studies have focused on cooperative inference where DNN models are partitioned at different neural network layers, and the two parts are executed at the mobile device and the edge server, respectively. Since the output data size of a DNN layer can be larger than that of the raw data, offloading intermediate data between layers can suffer from high transmission latency under limited wireless bandwidth. In this paper, we propose an efficient and flexible 2-step pruning framework for DNN partition between mobile devices and edge servers. In our framework, the DNN model only needs to be pruned once in the training phase where unimportant convolutional filters are removed iteratively. By limiting the pruning region, our framework can greatly reduce either the wireless transmission workload of the device or the total computation workload. A series of pruned models are generated in the training phase, from which the framework can automatically select to satisfy varying latency and accuracy requirements. Furthermore, coding for the intermediate data is added to provide extra transmission workload reduction. Our experiments show that the proposed framework can achieve up to 25.6X reduction on transmission workload, 6.01X acceleration on total computation and 4.81X reduction on end-to-end latency as compared to partitioning the original DNN model without pruning.	en
dc.description.sponsorship	ACKNOWLEDGEMENT This work is sponsored in part by the Nature Science Foundation of China (No. 61871254, No. 91638204, No. 61571265, No. 61861136003, No. 61621091), National Key R&D Program of China 2018YFB0105005, and Hitachi Ltd. This work is sponsored in part by the Nature Science Foundation of China (No. 61871254, No. 91638204, No. 61571265, No. 61861136003, No. 61621091), National Key R and D Program of China 2018YFB0105005, and Hitachi Ltd.	en
dc.description.status	Peer-reviewed	en
dc.identifier.isbn	9781728118789	en
dc.identifier.other	Scopus:85087276813	en
dc.identifier.other	ORCID:/0000-0002-6916-4789/work/171156687	en
dc.identifier.uri	http://www.scopus.com/inward/record.url?scp=85087276813&partnerID=8YFLogxK	en
dc.identifier.uri	https://hdl.handle.net/1885/733763894
dc.language.iso	en	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en
dc.relation.ispartofseries	2019 INFOCOM IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2019	en
dc.relation.ispartofseries	INFOCOM 2019 - IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2019	en
dc.rights	Publisher Copyright: © 2019 IEEE.	en
dc.title	Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning	en
dc.type	Conference paper	en
local.contributor.affiliation	Shi, Wenqi; Tsinghua University	en
local.contributor.affiliation	Hou, Yunzhong; Department of Electronic Materials Engineering, Research School of Physics, ANU College of Science and Medicine, The Australian National University	en
local.contributor.affiliation	Zhou, Sheng; Tsinghua University	en
local.contributor.affiliation	Niu, Zhisheng; Tsinghua University	en
local.contributor.affiliation	Zhang, Yang; Hitachi, Ltd.	en
local.contributor.affiliation	Geng, Lu; Hitachi, Ltd.	en
local.identifier.doi	10.1109/INFOCOMWKSHPS47286.2019.9093772	en
local.identifier.pure	197a8b89-05c7-49da-b8d0-f70d4b3a6eb9	en
local.type.status	Published	en

Collections

ANU Research Publications

Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

Downloads

Collections