[英]Pyspark Memory Issue during join condition
I am using spark 2.1.0. 我正在使用spark 2.1.0。 I have 2 dataframe not more than 3 mb .
我有2个数据框不超过3 mb。 When I tried to run inner join on 2 dataframes all my transformation logic every thing works perfectly.
当我尝试在2个数据帧上运行内部联接时,我所有的转换逻辑都可以正常运行。 But when I use RightOuter join on 2 dataframes I get below error.
但是,当我在2个数据帧上使用RightOuter连接时,出现以下错误。
Error 错误
RN for exceeding memory limits. 1.5 GB of 1.5 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 ERROR cluster.YarnScheduler: Lost executor 337 on ip-172-
21-1-105.eu-west-1.compute.internal: Container killed by YARN for exceeding
memory limits. 1.5 GB of 1.5 GB physical memory used. Consider boosting
spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 WARN scheduler.TaskSetManager: Lost task 34.0 in stage
283.0 (TID 11396, ip-172-21-1-105.eu-west-1.compute.internal, executor 337):
ExecutorLostFailure (executor 337 exited caused by one of the running tasks)
Reason: Container killed by YARN for exceeding memory limits. 1.5 GB of 1.5
GB physical memory used. Consider boosting
spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 WARN server.TransportChannelHandler: Exception in
connection from /172.21.1.105:50342
java.io.IOException: Connection reset by peer
I tried with alternatives 1)df.coalesce(xvalue).show() 2)Tried setting executor memory nothing worked. 我尝试了其他选择1)df.coalesce(xvalue).show()2)尝试设置执行程序内存没有用。
This issue is pending from past few weeks. 此问题在过去几周内仍未解决。 Can anyone please let me know where am I going wrong
谁能告诉我我要去哪里了
Could you please share the details regarding the datasets. 您能否分享有关数据集的详细信息。
Have you tried leftOuterJoin, is it also giving you the same error. 您是否尝试过leftOuterJoin,是否还会给您同样的错误。
Regards, 问候,
Neeraj Neeraj
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.