Previous | Next --- Slide 45 of 79

pslui88

This slide is illustrating the idea of partitioning the network in the horizontal direction, essentially giving the upper half of all the parameters to one worker node, and the bottom half to another worker node. If we use small spatial convolutions and reduce/shrink fully-connected layers to be less connected across the two worker nodes, we can increase parallelism.

pintos

This is instead of slicing the network in half vertically, which would create a sequential dependency between the nodes computing different halves of the network. Pipelining would help mitigate this but still transferring an output layer that's 256x27x27 is unreasonable.

Please log in to leave a comment.