Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 48 of 81

jyeung27

Note from lecture: Implementing compression helps, but does not help if we are not saturating the machine.

haiyuem

With the same computer system, compressing data helps with 40 cores because there are 40x loads and the program is memory bandwidth bound. It does not help with only one core because arithmetic intensity is already relatively high.

wzz

Key idea here: when we have a pipelined program with many sequential operations, the entire program throughput depends on the step with the smallest throughput. In this case, the bottleneck step is memory. When this happens, we can sacrifice throughput in other steps (computation) to increase through in the bottleneck step (memory), and our overall throughput will be higher.

a7hu

Notice that when running on only 1 core, the original Ligra performs better than the algorithm with compressed graph. On the other hand, when running with 40 cores, the algorithm with compressed graph perform better than original Ligra. This is because when running on 1 core, the arithmetic intensity of the original Ligra is high enough to saturate the compute resource. Because core can only execute certain amount of arithmetic operations, and adding graph compression increases the amount of computation work, so the system is bound by computation bandwidth and increasing the amount of computation work only makes performance worse. With 40 cores, the amount of computation resources is now 40 times more. However, the memory system remains the same, and to ensure the 40 cores fully occupied with compute work, we need 40x memory bandwidth. In this case, the performance is bound by memory bandwidth. As a result, compressing the graph effectively helps performance when the system is bound by memory bandwidth.

Please log in to leave a comment.