Previous | Next --- Slide 38 of 92
Back to Lecture Thumbnails
tyler.johnson

What kinds of things could enable parallelization for low arithmetic intensity operations? Is there any hope of memory seeing the same types of parallelization improvements that are developed for CPUs and GPUs? This seems like a major hurdle and bottleneck that will hold back all hardware improvements.

rmjones

We recall that the element-wise vector multiply example from the end of lecture 2 had an arithmetic intensity of 1/3 (one multiply v.s. two loads and a store) or 1/12 if you consider the number of bytes loaded and stored. This contrasts starkly with the NVIDIA GTX 1080 GPU which has a compute capability to available bandwidth ratio of 1.28 x 10^10 (2560 MULs per clock times 1.6 * 10^9 clocks per second divided by a bandwidth of 320 GB per second).

Please log in to leave a comment.