This page lists publications, preprints, and projects related to my work in machine learning. Comments and questions welcome.
Symmetries, flat minima, and the conserved quantities of gradient flow.
With B. Zhao, R. Walters, R. Yu, and N. Dehmamy. International Conference on Learning Representations (ICLR), 2023.
Quiver neural networks.
With R. Walters. Preprint, arXiv:2207.12773.
Universal approximation and model compression for radial neural networks.
With T. van Laarhoven and R. Walters. Preprint, arXiv:2107.02550.
Gradient flow and conserved quantities.
These are notes stemming from and supplementing the paper Symmetries, flat minima, and the conserved quantities of gradient flow above (arXiv:2210.17216). We explain several different characterizations of conserved quantities, provide examples, and explore conserved quantities in the context of a loss function that is invariant under a group action. Of particular interest are orthogonal group actions, which appear in the study of radial neural networks. We provide a description of the stablizer of a generic point in the parameter space of a radial neural network, which is related to the compression of network. We also consider a particular diagonalizable action.
Backpropagation.
These notes are an exploration of the backpropagation algorithm for computing the gradient of the loss function for parameters of a neural network. We provide theoretical justification for the algorithm, give several explicit pseudo-code implementations, discuss backpropagation in batches, and include a list of exercises.
EM algorithm.
These notes provide an exposition of the expectation-maximization (EM) algorithm for clustering. One can regard this algorithm as an unsupervised learning analogue of Linear Discriminant Analysis. The main reference for these notes is Section 9.2 of the book "Applied Machine Learning" by David Forsyth (link). We focus on the theoretical justification for the algorithm rather than implementation details.
Notes on "Graph Neural Networks are Dynamic Programmers".
These notes delve into the technical aspects of the paper "Graph Neural Networks are Dynamic Programmers" (NeurIPS 2022, arXiv:2203.15544). We formulate integral transforms using bags and lists. The update step of a dynamic programming algorithm can be interpreted as an integral transform; a prominent example is the Bellman--Ford algorithm. Furthermore, the message-passing step of a graph neural network can also be stated in terms of a certain integral transform.
Principal Component Analysis.
Principal component analysis is a technique for finding a new ordered basis (or partial basis) of the predictor space in such a way that most of the variability in the data can be captured in fewer dimensions. In these expository notes, we first provide a formulation of principal component analysis from the point of view of finding directions that maximize variability. We then explain the relationship between principal component analysis and the singular value decomposition.
Geometric Relational Algebra.
These notes illustrate an abstract take on relational algebra inspired by constructions from category theory and algebraic geometry. We define a category for every header whose objects are relations with that header. Relational algebra operations such as union, join, and product are functors between the appropriate categories. We also use pushforwards and pullbacks to give interpretations of aggregate functions, group by, and window functions.
Kaggle.
Minor learning projects for gaining familiarity with PyTorch. One of the projects is a computer vision analysis for distinguishing bees from wasps using a convolutional neural network (link to the Kaggle dataset). Another is a NLP sentiment analysis for IMDB movie reviews (link to the Kaggle dataset).
Bias-Variance.
An analysis of the bias-variance tradeoff in supervised statistical learning. We examine both the regression and classification settings.
Principal Component Analysis (alternative take).
A slightly different perspective on principal component analysis in terms of covariant matrices.