Speaker
Description
As neural networks become wider their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a growing body of work which examines the learning dynamics and distribution over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel, both before and after training; that the predictions of a class of wide neural networks are linear in their parameters throughout training; that the posterior distribution over parameters also takes on a simple form in wide Bayesian networks. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for theoretical and practical understanding of deep learning.
Neural Tangents:
https://github.com/google/neural-tangents