Theory of machine learning

Why do highly over-parametrized deep neural networks exhibit exceptional generalization capabilities? Are neural networks sensitive to the distribution of their inputs?

Machine learning and data science provide a rich source of questions that can be studied with tools from random matrix theory, high-dimensional probability, and statistics. Simplified or “toy” models (such as kernel methods, random features, and other tractable surrogates of neural networks) offer a way to isolate key mechanisms behind generalization, optimization, and representation learning. These perspectives have clarified both the strengths and limitations of kernel-like models, highlighted the central role of feature learning, and revealed many surprising (and not so surprising) high-dimensional phenomena. At the same time, much remains poorly understood, making this an active area for developing new mathematical ideas.

We predict the empirical test error of random feature ridge regression as a function of the size of the hidden layer for different ridge parameter δ.