Iordan Ganev

This page lists publications, preprints, and projects related to my work in machine learning. Comments and questions welcome.


Publications and Preprints

Symmetries, flat minima, and the conserved quantities of gradient flow.
With B. Zhao, R. Walters, R. Yu, and N. Dehmamy.
International Conference on Learning Representations (ICLR), 2023.

Abstract: We present a general framework for finding continuous symmetries in the parameter space of deep neural networks, which carve out low-loss valleys in the loss landscape. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. We also show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.

[arXiv] [github]

Quiver neural networks.
With R. Walters.
Preprint, arXiv:2207.12773.

Abstract: We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures by introducing the notion of a quiver neural network. Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows in complex network architectures. As an application, we use parameter space symmetries to prove a lossless model compression algorithm for quiver neural networks with certain non-pointwise activations known as rescaling activations. In the case of radial rescaling activations, we prove that training the compressed model with gradient descent is equivalent to training the original model with projected gradient descent.

[arXiv] [github]

Universal approximation and model compression for radial neural networks.
With T. van Laarhoven and R. Walters.
Preprint, arXiv:2107.02550.

Abstract: We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.

[arXiv] [github]




Notes and Projects

Gradient flow and conserved quantities.

These are notes stemming from and supplementing the paper Symmetries, flat minima, and the conserved quantities of gradient flow above (arXiv link). We explain several different characterizations of conserved quantities, provide examples, and explore conserved quantities in the context of a loss function that is invariant under a group action. Of particular interest are orthogonal group actions, which appear in the study of radial neural networks. We provide a description of the stablizer of a generic point in the parameter space of a radial neural network, which is related to the compression of network. We also consider a particular diagonalizable action.

[pdf]

Kaggle.

Minor learning projects for gaining familiarity with PyTorch. One of the projects is a computer vision analysis for distinguishing bees from wasps using a convolutional neural network (link to the Kaggle dataset). Another is a NLP sentiment analysis for IMDB movie reviews (link to the Kaggle dataset).

[github]

Bias-Variance.

An analysis of the bias-variance tradeoff in supervised statistical learning. We examine both the regression and classification settings.

[pdf]

Principal Component Analysis.

Principal component analysis is a technique for finding a new ordered basis (or partial basis) of the predictor space in such a way that most of the variability in the data can be captured in fewer dimensions. In these expository notes, we first provide a formulation of principal component analysis from the point of view of finding directions that maximize variability. We then explain the relationship between principal component analysis and the singular value decomposition.

[pdf]

Principal Component Analysis (alternative take).

A slightly different perspective on principal component analysis in terms of covariant matrices.

[pdf]