Stochastic Gradient Descent (SGD) and its variants are almost universally used to train neural networks and to fit a variety of other parametric models.
These algorithms, especially when applied to deep learning, exhibit a lot of unexpected and often beneficial behaviors that have baffled practitioners and theoreticians for decades. In this talk, our aim is to introduce modified equations as a novel method for analyzing stochastic optimization algorithms using analytical methods. This theory combines ideas first developed in the field of numerics of differential equations with stochastic calculus and optimization to study problems from machine learning. If time permits we will discuss an application of modified equations to the derivation of optimal hyperparameter schedules for SGD.