Parallel Graph Processing Frameworks + How DRAM Works

Previous | Next --- Slide 71 of 81

nanoxh

Here the data is interleaved across the DRAM chips so that all the buses will be busy when we load in data.

jlara

In this example, 8 DRAM chips are used to transmit 64 bits in parallel to the memory controller. Each chip must read 8 contiguous bits from its own storage. Note that in contrast to the last slide, physical memory is distributed across each chip in 8-byte segments, meaning that a read of 64 bits (which is contiguous in the application address space) actually accesses 8 chips at once, where each of the 1/8th segments are stored on their own chip. This allows for maximum parallelization of data access in the most common use case.

arkhan

How does the memory controller know what data to interleave and in what way? Does it have to coordinate with the MMU in order to achieve this level of arbitrary placement?

msere

I would think that the interleaving isn't decided on by the memory controller, but inherent in the layout of the ram. My assumption is that a physical address resides on DRAM chip number addr%8, and addr/8 corresponds to the location of the byte on the chip, or something along those lines

lfu

@arkhan I was also curious about this, since it would seem like a lot of overhead to move the data into this format if all of the addresses were in one bank for example. I think @msere's explanation makes a lot of sense, though I could see some cases where this approach could go awry. As a contrived example, a program which loads only bytes that fall within the first chip. Would this cause this contrived program to be 8x more bandwidth bound than a normal program which utilizes contiguous data (and thus all pins are active) since only 8 of 64 pins are active?

msere

@Ifu I don't think that particular pattern would be an issue, since the smallest granularity we get load from memory is a cacheline, which seems to coincide with a read read that draws from all 8 chips, and practically, the subsequent reads would be from the cache. However it does seem reasonable that the MMU might try to, for example, put data stored within a small time window onto the same row, even if their addresses wouldn't normally align that way, in order to minimize the cost of line recharges and row accesses

Please log in to leave a comment.