Previous | Next --- Slide 64 of 88

suninhouse

The benefit of caching depending on how many times of piece of data would be re-used.

nanoxh

Now GPUs also have shared memory on chip and it is shared within a thread block. Accessing shared memory is 5 clock cycles. If you code within a block uses the same memory, you can preload the memory into shared memory and reduce memory latency further. It does require careful code design. There are many Nvidia blogs teaching how to use shared memory.

mvpatel2000

One reason for this design might be that CPUs do a lot more context switching so they need larger caches, but GPUs might often need to load and output large amounts of data with small operations. This makes bandwidth a more important factor than caching.

Another tradeoff appears to be in the choice to use DDR5 DRAM instead of DDR4 DRAM. As best I can tell, DDR5 DRAM allows for supporting the higher bandwidth but has higher latency, which is generally not acceptable for a CPU. It also appears to draw more power than DDR4 DRAM, which might be too expensive for general use when an improved RAM card might not provide substantial benefits.

cmchiang

So the memory bandwidth to DDR4 RAM (76GB/s) is lower than PCIe 3.0 x16 (15.754 GB/s)? Just found this after finishing the CUDA Assignment.

Please log in to leave a comment.