Abstract/Bio

Stanford EE Computer Systems Colloquium

4:30 PM, Wednesday, January 6, 2016
NEC Auditorium, Gates Computer Science Building Room B3
Stanford University
http://ee380.stanford.edu

Deep Compression and EIE: Deep Neural Network Model Compression and Hardware Acceleration

Song Han
Stanford University

About the talk:

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we first introduce "deep compression" to reduce the storage requirement of neural networks without affecting their accuracy. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x from 240MB to 6.9MB, VGG-16 by 49x from 552MB to 11.3MB, both with no loss of accuracy. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. This also allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory.

Next we propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the inherent modified sparse matrix-vector multiplication. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the DNN without compression. EIE with processing power of 102 GOPS at only 600mW is also 24,000x and 3,000x more energy efficient than a CPU and GPU respectively.

Slides:

Download the slides for this presentation in PDF format.

Videos:

Join the live presentation. Wednesday January 6, 4:30-5:45. Requires Microsoft Windows Media player.
View the 2016 Stanford Archive Lectures
View video by lecture sequence in HTML5. Available after 8PM Pacific on the day of the lecture.
View Video on YouTube. The YouTube video is uploaded the day following the presentation. You may need to disable your pop-up blocker for this link to work. Or use the direct YouTube link.

About the speaker:

[speaker photo] Song Han is a fourth year PhD student with Prof. Bill Dally at Stanford University. His research interest is computer architecture and high performance computing for deep learning. Currently his research is improving the energy efficiency of neural networks targeting mobile and embedded systems. He worked on model compression and hardware accelerator on the compressed model that fit state-of-the-art DNN models fully on-chip, which has been covered by TheNextPlatform. Before joining Stanford, Song Han graduated from Institute of Microelectronics, Tsinghua University in 2012.

Contact information:

Song Han Stanford University

Stanford EE Computer Systems Colloquium

4:30 PM, Wednesday, January 6, 2016 NEC Auditorium, Gates Computer Science Building Room B3 Stanford University http://ee380.stanford.edu

Deep Compression and EIE: Deep Neural Network Model Compression and Hardware Acceleration

Song Han Stanford University

4:30 PM, Wednesday, January 6, 2016
NEC Auditorium, Gates Computer Science Building Room B3
Stanford University
http://ee380.stanford.edu

Song Han
Stanford University