Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 18 of 81

nickbowman

To summarize what's going on with this slides, we see an interesting optimization to PageRank being applied here, where convergence is determined on a per-vertex basis rather than a global basis. The general idea is that we should only continue to update PageRank for those nodes that have at least one neighbor whose PageRank value has changed substantially in the last iteration (old and new value differ by a value greater than some threshold). In practice, the way that this is enforced is that we use the signal mechanism to identify nodes for whom we want to continue calculating PageRank updates for. Thus, we are able to improve overall efficiency by scheduling nodes for computation that have not yet converged.

haiyuem

@nickbowman Thanks! I was about to ask questions about this slide but your summary makes sense to me. I can see how this implementation is better than the pagerank in our assignment because it avoids doing work on vertices that have already converged.

zecheng

So basically the code on this page is an approximate "Pagerank" instead of computing the exact Pagerank value?

ChrisGabor

@zecheng, I think any iterative pagerank algorithm would be approximate and iterate until convergence, since the values of adjacent vertices are all initialized to an incorrect starting value. I found that there is an algebraic formula for the exact pagerank, but inverting large matrices can often be more expensive than power iteration methods. This slide is the same iterative algorithm we had on assignment 4, but only computes edges where the vertex has a value that would cause the neighbors to need to be updated.

Please log in to leave a comment.