Institutional Repository of Peking University: Adaptive-Precision Framework for SGD Using Deep Q-Learning - 开云app体育

Title	Adaptive-Precision Framework for SGD Using Deep Q-Learning
Authors	Zhang, Wentai Huang, Hanxian Zhang, Jiaxi Jiang, Ming Luo, Guojie
Affiliation	Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing 100871, Peoples R China. Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China. Peking Univ, Dept Informat Sci, Sch Math Sci, Beijing 100871, Peoples R China.
Issue Date	2018
Publisher	2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS
Citation	2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS. 2018.
Abstract	Stochastic gradient descent (SGD) is a widelyused algorithm in many applications, especially in the training process of deep learning models. Low-precision implementation for SGD has been studied as a major acceleration approach. However, if not appropriately used, low-precision implementation can deteriorate its convergence because of the rounding error when gradients become small near a local optimum. In this work, to balance throughput and algorithmic accuracy, we apply the Q-learning technique to adjust the precision of SGD automatically by designing an appropriate decision function. The proposed decision function for Q-learning takes the error rate of the objective function, its gradients, and the current precision configuration as the inputs. Q-learning then chooses proper precision adaptively for hardware efficiency and algorithmic accuracy. We use reconfigurable devices such as FPGAs to evaluate the adaptive precision configurations generated by the proposed Q-learning method. We prototype the framework using LeNet-5 model with MNIST and CIFAR10 datasets and implement it on a Xilinx KCU1500 FPGA board. In the experiments, we analyze the throughput of different precision representations and the precision-selection of our framework. The results show that the proposed framework with adapative precision increases the throughput by up to 4.3 x compared to the conventional 32-bit floating point setting, and it achieves both the best hardware efficiency and algorithmic accuracy.
URI	http://hdl.handle.net/20.500.11897/575675
ISSN	1933-7760
DOI	10.1145/3240765.3240774
Indexed	CPCI-S(ISTP)
Appears in Collections:	信息科学技术学院数学科学学院

Files in This Work

There are no files associated with this item.

Web of Science®

0

Checked on Last Week

Scopus®

Checked on Current Time

百度学术™

0

Checked on Current Time

Google Scholar™

Check

Show full item record

License: See PKU IR operational policies.