简体   繁体   中英

spark emr job, executors being lost

Hi i have a spark submit job/step that is running continuously without failing but losing executors. The data increases everyday, initially it ran fine but now with 400gb of data in s3 it seems (at least what i think is that there isn't enough resource

I am using , 18 r3.8xlarge for this.

                "EMR_MasterInstanceType": "r3.xlarge",
                "EMR_CoreInstanceType": "r3.8xlarge",
                "EMR_CoreInstanceCount": "18",

"Step2_Spark_Command": "command-runner.jar,spark-submit,--class,com.lex.rex.link.modules.multipart_files.files,--name,\\\\\\"Multipart Files Module\\\\\\",--master,yarn,--deploy-mode,client,--executor-memory,22G,--executor-cores,4,--conf,spark.sql.shuffle.partitions=320,/home/hadoop/linking.jar,jobId=#{myJobId},environment=test",

Any thoughts or insight? is the current configuration sufficent?

If i am using 18 ec2 instances of r3.8xl and 22G executor memory i would have 396G of ram for in memory processing? is my assumption even correct?

400gb > 396gb is that why it's failing?

Wondering what's the number of executors?

Assuming that the ERROR info is ExecutorLostFailure (executor lost).

It's mostly caused by inadequate resources, which causes bad performance for executor, such as badly GC. You can increase the app's resource by increasing the number of executors. If not, you can do some tuning, like increase the value of: spark.shuffle.io.retryWait, spark.shuffle.io.maxRetries

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM