Title | Adaptive-Precision Framework for SGD Using Deep Q-Learning |
Authors | Zhang, Wentai Huang, Hanxian Zhang, Jiaxi Jiang, Ming Luo, Guojie |
Affiliation | Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing 100871, Peoples R China. Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China. Peking Univ, Dept Informat Sci, Sch Math Sci, Beijing 100871, Peoples R China. |
Issue Date | 2018 |
Publisher | 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS |
Citation | 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS. 2018. |
Abstract | Stochastic gradient descent (SGD) is a widelyused algorithm in many applications, especially in the training process of deep learning models. Low-precision implementation for SGD has been studied as a major acceleration approach. However, if not appropriately used, low-precision implementation can deteriorate its convergence because of the rounding error when gradients become small near a local optimum. In this work, to balance throughput and algorithmic accuracy, we apply the Q-learning technique to adjust the precision of SGD automatically by designing an appropriate decision function. The proposed decision function for Q-learning takes the error rate of the objective function, its gradients, and the current precision configuration as the inputs. Q-learning then chooses proper precision adaptively for hardware efficiency and algorithmic accuracy. We use reconfigurable devices such as FPGAs to evaluate the adaptive precision configurations generated by the proposed Q-learning method. We prototype the framework using LeNet-5 model with MNIST and CIFAR10 datasets and implement it on a Xilinx KCU1500 FPGA board. In the experiments, we analyze the throughput of different precision representations and the precision-selection of our framework. The results show that the proposed framework with adapative precision increases the throughput by up to 4.3 x compared to the conventional 32-bit floating point setting, and it achieves both the best hardware efficiency and algorithmic accuracy. |
URI | http://hdl.handle.net/20.500.11897/575675 |
ISSN | 1933-7760 |
DOI | 10.1145/3240765.3240774 |
Indexed | CPCI-S(ISTP) |
Appears in Collections: | 信息科学技术学院 数学科学学院 |