Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 47 of 81

jgrace

This method of compressing the graph is useful in that it groups vertices into sections that require the same amount of bytes for their representations. In other words, it repeats the number of bytes needed to represent adjacent vertex differences so that these numbers can be stored as small low precision offsets rather than their original large numbers.

itoen

Run-link encoding (RLE on the next slide) is a compression scheme that stores data that has a lot of runs (contiguous elements that are the same) by storing the element followed by how many times they are repeated. For example, 10 10 10 10 10 would be stored as 10 5.

I was initially confused why this was an RLE, as the differences weren't the one that had runs. But the runs stored here are the sizes of the elements. Hence we can put the bytes of the differences right next to each other as we know the sizes.

I think it's not really obvious why we would expect the differences to have runs of the same number of bytes. This example uses really nice edge numbers to show long runs of the same number of bytes, but if we took a random graph with random edges, and choose some vertex, I wonder how good this scheme is - perhaps there's a statistical result on this?

xhe17

For my current understanding of this edge list compression algorithm, there are two benefits: 1. by storing the difference between each pair of adjacent sorted values, we are able to monotonically reduce the number of bytes needs to store the whole information. 2. due to the way edges are numbered in a graph, the difference between adjacent edge values tend to form clusters of values of similar size, which could be exploited by us using "encode deltas" methods to further reduce the needed memory to store all the information.

Please log in to leave a comment.