Previous | Next --- Slide 35 of 49
Back to Lecture Thumbnails
mziv

In essence, map and filter only require operating on one small piece of data; they have a very narrow dependency. groupByKey and sort, however, require the entire set of work to finish before they can output an answer - they have very wide dependencies that you can't break up into smaller chunks.

fizzbuzz

^^ The fact that map and filter only need a small amount of information and that they can be executed independently across a sequence is captured well by their type signatures. map f :: (a -> b) -> [a] -> [b] tells you that the function you provide cannot possibly care about anything but one element of the array, because all it has is an a. This is especially true in a pure functional language where your functions are guaranteed to be side effect free. Ditto filter :: (a -> Bool) -> [a] -> [a].

You'll never, on the other hand, see such a signature for sort. It's not possible because the function we'd be asking for would have to decide where an element is going without looking at anything but that element. I.e., given the list [_, _, _, 5, _, _, _] where the _ are unknown, tell me where 5 goes in sorted order.

xhe17

The answer of this question is in page 37, which indicates the data both map and filter function applied on could be splitted into smaller chuncks of data to be processed by different nodes, where we have "narrow dependency" among different chuncks of data, as no communication is required among different nodes of a cluster. Thus map and filter could be fused on the data they are applied on. However this is not the case for groupByKey and Sort.

Please log in to leave a comment.