stanwie

How is row_starts calculated?

cyb

@stanwie It's similar to the idea of "start-flag" of nested sequences on slide 28: row_starts[i] is the sum of the number of non-zero elements in rows 0, 1, ..., i - 1. Source: https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)

pslui88

This slide shows an example of a problem that can be represented in a data-dense format, which can then be conducive to parallelizing. This problem is sparse matrix-vector multiplication. Rather than storing the full matrix in memory, which is largely wasteful due to all the duplicate zeros, this slide shows that you can store the non-zero values in a 2D array, as well as a corresponding 2D array that holds the column indices of each non-zero value.

trip

I'm afraid I'm still not sure about what row_starts is. I'm not sure it can be the sum of the number of nonzero elements in rows because I don't see how the numbers here correlate to that idea, and I rewatched lecture and Kayvon described it as "the location of the start of every element in the sparse matrix" so that you can know the number of nonzeroes per row. I totally buy that the first element in this matrix starts at row 0, but isn't the second 'element' in the sparse matrix [2]? Would that mean that row_starts[1] applies to it, because its value is also 2, and I don't see how the value 2 shows anything about the location [1,1] in this matrix.

It looks like there's a 1:1 correspondence between singleton elements in the matrix and elements in row_starts, but in that case, would [2] correspond to the value 3? I'd love some more clarification here :)

nanoxh

@trip, it looks like row_starts is a quick way to access each row. If you just want ith row, then you can look at the ith element in row_starts, say it is n. Then the nth element in values will be this row (if we make values a 1D vector). Yes it is also easy to know number of non-zero elements in ith row, it is just row_starts[i+1]-row_starts[i].

trip

@nanoxh ahhh that makes a lot of sense, thanks! I was really confused by the fact that elements in subsequences are considered individually by row_starts. Thanks for the clarification :)

nanoxh

@trip You are welcome! Glad it helped. :)

wanze

I was also initially confused by the row_start array.

To serve as an example to the explanations above. In [0, 2, 3, 4, ...], 0 means that the non-zero elements in row at index 0 can be found at 0th element in "values" array, which is [3,1]. And 2 means that the row at index 1's non-zero values can be found at element at index 2 in the "values" array, which is [2]. Similarly, row at index 2's non-zero value is [4], which is positioned at index 3 in the "values" array, etc.