Calendar of Events
Events Calendar
Understanding Deep-Learning as a physicist: What would Einstein do?
Speaker: Yuhai Tu (IBM)
Abstract: Despite the great success of deep learning, it remains largely a black box. For example, the main search engine in deep neural networks is based on the Stochastic Gradient Descent (SGD) algorithm, however, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we will first give a general overview of SGD followed by a more detailed description of our recent work [1-3] on the SGD learning dynamics, the loss function landscape, and their relationship. Time permits, we will discuss a more recent work on trying to understand why flat solutions are more generalizable and whether there are other measures for better generalization based on an exact duality relation we found between neuron activity and network weight [4].
[1] “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.
[2] “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), July 19, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf
[3] “Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions”, Ning Yang, Chao Tang, and Y. Tu, Phys. Rev. Lett. (PRL) 130, 130 (23), 237101, 2023.
[4] “Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization”, Yu Feng, Wei Zhang, D Zhang, Y Tu, Nature Machine Intelligence, 2023
Host: Anirvan Sengupta