Previous | Next --- Slide 29 of 79
Back to Lecture Thumbnails
haiyuem

The problem here is that the synchronization cost is high, especially when mini-batches are small: at the end of each iteration all partial gradients need to be summed up to the total gradients

trip

The tradeoff developers must work with here is that, as batch size increases, learning slows. In lecture Kayvon said that batches for image processing should remain in the realm of 10-1000, when there might be billions of images in the total set!

Please log in to leave a comment.