简体   繁体   English

在Alluxio / Tachyon中对RDD进行转换以获取火花的内存使用情况

[英]Memory usage for transformation on RDD's in alluxio/tachyon for spark

Lets say we create an RDD from alluxio memory 假设我们从alluxio内存创建RDD

rdd1 = sc.textFile("alluxio://.../file1.txt")
rdd2 = rdd1.map(...)

Does rdd2 reside on alluxio or on spark 's heap. rdd2驻留在alluxio还是spark的堆上。

Also would an operation like (both pairRDD's on alluxio) pairRDD1.join(pairRDD2) create a new RDD on alluxio or on spark heap. (alluxio上的pairRDD都一样) pairRDD1.join(pairRDD2)之类的操作也会在alluxio或spark堆上创建新的RDD。

The reason for the second question is that I need to join 2 large RDD's both on alluxio. 第二个问题的原因是,我需要在alluxio上同时加入两个大型RDD。 Would the join use alluxio's memory or would the RDD's get pulled into spark memory for the join (and where would the resulting RDD reside). 连接将使用Alluxio的内存还是将RDD的数据拉入Spark内存进行连接(以及生成的RDD将驻留在何处)。

Spark transformations are evaluated in a lazy fashion. Spark转换以惰性方式进行评估。 That means map() will not be evaluated until a result is required, and will not consume any Spark memory. 这意味着map()在需要结果之前将不会进行评估,并且不会消耗任何Spark内存。 An RDD will only consume Spark memory if you explicitly call cache() on the RDD. 如果您在RDD上显式调用cache() ,则RDD仅会消耗Spark内存。

Therefore, when you are joining 2 RDDs from Alluxio, only the source data of the RDDs will be memory, in Alluxio. 因此,当您从Alluxio连接2个RDD时,在Alluxio中,仅RDD的源数据将是内存。 During the join, Spark will use the memory required to execute the join. 在连接期间,Spark将使用执行连接所需的内存。

Where the resulting RDD resides depends on what you are doing with that RDD. 生成的RDD驻留的位置取决于您对该RDD所做的操作。 If you are writing the resulting RDD out to a file, that RDD will not be fully materialized in Spark memory, but will be written out to the file. 如果将生成的RDD写出到文件中,则该RDD不会在Spark内存中完全实现,但会写出到文件中。 If that file is in Alluxio, it would be in Alluxio memory, and not Spark memory. 如果该文件位于Alluxio中,则它将位于Alluxio内存中,而不是Spark内存中。 The resulting RDD will only be in Spark memory if you explicitly call cache() . 如果显式调用cache()则生成的RDD将仅位于Spark内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM