Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
TAPP: DNN Training for Task Allocation through Pipeline Parallelism Based on Distributed Deep Reinforcement Learning
oleh: Yingchi Mao, Zijian Tu, Fagang Xi, Qingyong Wang, Shufang Xu
Format: | Article |
---|---|
Diterbitkan: | MDPI AG 2021-05-01 |
Deskripsi
The rapid development of artificial intelligence technology has made deep neural networks (DNNs) widely used in various fields. DNNs have been continuously growing in order to improve the accuracy and quality of the models. Moreover, traditional data/model parallelism is hard to expand due to communication bottlenecks and hardware efficiency issues. However, pipeline parallelism trains multiple batches, reducing training overheads, so that it can achieve better acceleration effect. Considering the complexity of solving the pipeline parallel task allocation problem in heterogeneous computing resources, in this paper, a task allocation in pipeline parallelism (TAPP) based on deep reinforcement learning, is proposed. In TAPP, the predictive network is trained by a policy gradient until it obtains the optimal pipeline parallel task allocation scheme and speeds up the model training. Experimental results show that, on average, the single-step training time of TAPP is decreased by 1.37 times and the proportion of communication time is reduced by 48.92%, compared with the data parallelism, bulk synchronous parallel (BSP).