LSGD¶
Multi-Leader Setting¶
We consider distributed optimization under communication constraints for training deep learning models. The GPUs inside the same node are connected through PCIe and across nodes they commuicate through TCP/IP. Communicating through PCIe is much faster and more statble compared with TCP/IP.
We propose a multi-leader setting well-aligned with the hardware architecture. We divide workers into different groups based on such physical structure and generate a local leader and a global leader for each worker.