September 27, 2021 to October 1, 2021
ECT* - Trento
Europe/Rome timezone

Why deep networks generalize

Oct 1, 2021, 10:50 AM
Aula Renzo Leonardi (ECT* - Trento)

Aula Renzo Leonardi

ECT* - Trento

Strada delle Tabarelle, 286 38123 - Villazzano (TN) Italy


Robert de Mello Koch (Huzhou University and University of the Witwatersrand)


Training a deep network involves applying an algorithm which fixes the parameters of the network. The performance of the trained deep network is evaluated by studying the trained
network's performance on unseen test data. The difference between how the network performs on the training data and on unseen data defines a generalization error. Networks that perform as well on unseen data as they did on training data, have a small generalization error.

We have definite expectations for the size of the generalization error, based essentially on common sense. If the training data set is much smaller than the number of parameters in the network, training can fit any
data perfectly, so that errors and noise are captured during training. Typical deeps network applications use deep networks with hundreds of millions of parameters, trained using data sets with tens of thousands of parameters. Clearly then, we are squarely in the regime of large generalization errors. Remarkably however, for typical deep learning applications, the generalization error is small. This begs the question: why do deep nets generalize?

In this talk we develop parallels between deep learning and the renormalization group to suggest why deep networks generalize.

Presentation materials