This paper develops some theory of the Dyson equation for correlated linearizations and uses it to solve a problem on asymptotic deterministic equivalent for the test error in random features regression. The theory developed for the correlated Dyson equation includes existence-uniqueness, spectral support bounds, and stability properties. This theory is new for constructing deterministic equivalents for pseudo-resolvents of a class of correlated linear pencils. In the application, this theory is used to give a deterministic equivalent of the test error in random features ridge regression, in a proportional scaling regime, wherein we have conditioned on both training and test datasets.
Thesis
The matrix Dyson equation for machine learning: Correlated linearizations and the test error in random features regression
Contemporary machine learning models, particularly deep learning models, are frequently trained on large datasets within high-dimensional feature spaces, presenting challenges for traditional analytical approaches. Notably, the effective generalization of highly overparameterized models contradicts conventional statistical wisdom. Furthermore, the presence of non-linear activations in artificial neural networks adds complexity to their analysis. To simplify theoretical analysis, it is often assumed that training data is sampled from an unstructured distribution. While such analyses offer insights into certain aspects of machine learning, they fall short in elucidating how neural networks extract information from the structure of the data, crucial for their success in real-world applications. Fortunately, random matrix theory has emerged as a valuable tool for theoretically understanding certain machine learning procedures. Various techniques have been employed to explore large random matrices through asymptotic deterministic equivalents. One such approach involves substituting the random resolvent associated with a large random matrix with the solution of a deterministic fixed-point equation known as the matrix Dyson equation. Another effective technique, known as the linearization trick, involves embedding a matrix expression into a larger random matrix, termed a linear matrix pencil, with a simplified correlation structure. In this thesis, we extend the matrix Dyson equation framework to derive an anisotropic global law for a broad class of pseudo-resolvents with general correlation structures. This extension enables the analysis of spectral properties of a wide range of random matrices using a simpler and deterministic solution to the matrix Dyson equation. Through the development of this theory, we address critical aspects such as existence-uniqueness, spectral support bounds, and stability properties. These considerations are essential for constructing deterministic equivalents for pseudo-resolvents of a class of correlated linear pencils. Leveraging this theoretical framework, we provide an asymptotically exact deterministic expression for the empirical test error of random features ridge regression. The random features model, characterized by its non-linear activation function and potential for overparameterization, emerges as a powerful model for studying phenomena observed in real-life machine learning models, such as multiple descent and implicit regularization. Our exact expression facilitates a precise characterization of the implicit regularization of the model and unveils connections between random features regression and closely related kernel methods. Since we make no particular assumptions about the distribution of the data and response variable, our work represents a significant step towards understanding how neural networks exploit specific data structures.