Previous | Next --- Slide 52 of 55
Back to Lecture Thumbnails
user1234

By increasing ParO, the iterations needed for Pipe.Reduce are decreased. When ParO=2, the memory bandwidth is enough so we can reduce the runtime by a factor of 2. However, when ParO=4, we run out of memory bandwidth so the time needed for DRAM Transfers is increased. Therefore, we are not able to achieve more performance improvement even though the iterations for Pipe.Reduce are further decreased.

marwan

I don't get how doubling ParO didn't cause the bandwidth to limit our performance. Does this mean that our bandwidth was enough for two iterations in parallel. And if that is the case, I have a silly question can we use ParO value of 3 if the bandwidth was enough and if the number of iterations was divisible by 3?

a7hu

When ParO= 2 -> 4, the DRAM transfer doubles as we run out of memory bandwidth. The system is memory bound and there is no more performance as outer parallelization increases.

Please log in to leave a comment.