Second Order Methods for Neural Network Analysis, Training, and Inference

Zhewei Yao, University of California, Berkeley
2/17, 2021 at 4:10PM-5PM in https://berkeley.zoom.us/j/186935273

Second order based analysis/computation is widely used in scientific computing due to its fast convergence rate and extra useful curvature information as compared to first order methods. However, the application of second order methods is very limited for neural networks since naive Hessian-based computations are infeasible for large NN problems. In this talk, I will present fast and efficient ways to compute different metrics of second order information, including eigenvalues, trace, and estimated spectral density. As an application, I will show how we can use those to analyze the generalization ability of a neural network, and the effectiveness of neural network design choices. Furthermore, I will introduce a novel adaptive second order optimizer for machine learning, and show its superb performance on computer vision, natural language processing, and recommendation system as compared to first order methods. Finally, I will discuss how to leverage the second order information to systematically study neural network quantization, and present the new SOTA results on both computer vision and natural language processing.