Introduction

Distributed Training¶

The goal of distributed training of deep learning models—to significantly reduce the training time of deep learning models without degrading their performance.

Pipelines¶

dist

Motivation¶

We consider distributed optimization under communication constraints for training deep learning models. Our method differs from the state-of-art parameter-averaging scheme EASGD in a number of ways:

objective formulation that does not change the location of stationary points compared to the original optimization problem
avoiding convergence decelerations caused by pulling local workers descending to different local minima (to the average of their parameters)
breaking the curse of symmetry - the phenonmenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscape
communication efficiency and alignment with current hardware architecture