简体   繁体   English

在纱线上配置执行程序和驱动程序内存

[英]Configuring Executor and Driver memory in Spark-on-Yarn

I am confused with configuring executor and driver memory in Spark-1.5.2. 我对在Spark 1.5.2中配置执行程序和驱动程序内存感到困惑。

My environment settings are as below: 我的环境设置如下:

3 Node MAPR Cluster - Each Node: Memory 256G, 16 CPU 
Hadoop 2.7.0 
Spark 1.5.2 - Spark-on-Yarn

Input data information: 输入数据信息:

460 GB Parquet format table from Hive I'm using spark-sql for querying the hive context with spark-on-yarn,but it's lot slower than the Hive, and am not sure with the right memory configurations for Spark, Hive的460 GB Parquet格式表我正在使用spark-sql通过spark-on-yarn查询hive上下文,但是它比Hive慢很多,并且不确定Spark的内存配置是否正确,

These are my config's, 这些是我的配置

    export SPARK_DAEMON_MEMORY=1g
    export SPARK_WORKER_MEMORY=88g

    spark.executor.memory              2g
    spark.logConf                      true
    spark.eventLog.dir                 maprfs:///apps/spark
    spark.eventLog.enabled             true
    spark.serializer                   org.apache.spark.serializer.KryoSerializer
    spark.driver.memory                5g
    spark.kryoserializer.buffer.max    1024m

How to avoid Spark java.lang.OutOfMemoryError: Java heap space exceptions and GC overhead limit exceeded exceptions!! 如何避免Spark java.lang.OutOfMemoryError:Java堆空间异常和GC开销限制超出了异常! ??? ???

Really appreciate your assistance in this! 非常感谢您的协助!

At a first glance, you are running out the memory of your executors. 乍一看,您耗尽了执行程序的内存。 I would suggest increasing their memory. 我建议增加他们的记忆力。

Note that SPARK_WORKER_MEMORY is only used in standalone mode. 请注意,SPARK_WORKER_MEMORY仅在独立模式下使用。 SPARK_EXECUTOR_MEMORY is used in YARN mode. SPARK_EXECUTOR_MEMORY在YARN模式下使用。

If you are not running anything else on the cluster you could try out the following config: 如果您没有在集群上运行其他任何东西,则可以尝试以下配置:

spark.executor.memory   16g
spark.executor.cores    1
spark.executor.instances 40
spark.driver.memory  5g (make it bigger if expected 
                         final result dataset is larger)

I do not recommend to set a large executor memory because that typically increments the GC time. 我不建议设置较大的执行程序内存,因为这通常会增加GC时间。 Other thing I see, it is that those instances are memory optimized. 我看到的另一件事是,这些实例是内存优化的。 Think twice if this fits your case. 如果适合您的情况,请三思。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM