Previous | Next --- Slide 30 of 79

tyler.johnson

The downside of synchronous training is that as the number of nodes increases it becomes increasingly less efficient (it does not scale). This is due to the communication requirements associated with updating all gradients. Because all nodes operate under the assumption that they will received updated parameters after every batch from every other node, communication costs become the major bottleneck under this setup. This means that reducing the latency of the communication or the requirements for the communication are key ways to improve on the efficiency of synchronous training.

andykhuu

To reiterate the comment above, the problem with synchronous training is that it's incredibly inefficient to serially update an array of weights one element at a time. A good way to remove this bottle neck is to find ways to parallelize updates while minimizing communication.

Please log in to leave a comment.