Previous | Next --- Slide 13 of 63
Back to Lecture Thumbnails
ishangaur

In cases where the workload does not divide easily by the gang size and there is some leftover work, are there some principles we can use to think about whether it would be better to add another instance or just distribute the extra work among some of the existing instances?

lonelymoon

From my understanding, this is a good example to control the map between instances and elements of output array. This can be done by modifying how to calculate 'idx' inside the for-loop with respect to programIndex. As described in the class and the next slides, even though this configuration seems more simple, this is less efficient than the original one because of non-contiguous memory values for instruction.

Please log in to leave a comment.