27–31 May 2024
ECT*
Europe/Rome timezone

Understanding infinite width neural networks from the perspective of statistical mechanics

29 May 2024, 15:45
45m
Aula Renzo Leonardi (ECT*)

Aula Renzo Leonardi

ECT*

Strada delle Tabarelle 286, I-38123 Villazzano (Trento)

Speaker

Dr Jascha Sohl-Dickstein (Anthropic)

Description

As neural networks become wider their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a growing body of work which examines the learning dynamics and distribution over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel, both before and after training; that the predictions of a class of wide neural networks are linear in their parameters throughout training; that the posterior distribution over parameters also takes on a simple form in wide Bayesian networks. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for theoretical and practical understanding of deep learning.

Neural Tangents:
https://github.com/google/neural-tangents

Primary author

Presentation materials