Previous | Next --- Slide 22 of 49

jlara

The key benefit of Spark for cluster-scale computations is its handling of intermediate datasets. Holding intermediate datasets as files is highly inefficient, and reading/writing these files is the primary bottleneck. Spark aims to resolve this by holding intermediates in memory rather than in files, where work on the data can be performed much more quickly.

Please log in to leave a comment.