简体   繁体   English

连接条件期间的Pyspark内存问题

[英]Pyspark Memory Issue during join condition

I am using spark 2.1.0. 我正在使用spark 2.1.0。 I have 2 dataframe not more than 3 mb . 我有2个数据框不超过3 mb。 When I tried to run inner join on 2 dataframes all my transformation logic every thing works perfectly. 当我尝试在2个数据帧上运行内部联接时,我所有的转换逻辑都可以正常运行。 But when I use RightOuter join on 2 dataframes I get below error. 但是,当我在2个数据帧上使用RightOuter连接时,出现以下错误。

Error 错误

RN for exceeding memory limits. 1.5 GB of 1.5 GB physical memory used. 
Consider boosting spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 ERROR cluster.YarnScheduler: Lost executor 337 on ip-172-
21-1-105.eu-west-1.compute.internal: Container killed by YARN for exceeding 
memory limits. 1.5 GB of 1.5 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 
283.0 (TID 11396, ip-172-21-1-105.eu-west-1.compute.internal, executor 337): 
ExecutorLostFailure (executor 337 exited caused by one of the running tasks) 
Reason: Container killed by YARN for exceeding memory limits. 1.5 GB of 1.5 
GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
17/08/02 02:29:53 WARN server.TransportChannelHandler: Exception in 
connection from /172.21.1.105:50342
java.io.IOException: Connection reset by peer

I tried with alternatives 1)df.coalesce(xvalue).show() 2)Tried setting executor memory nothing worked. 我尝试了其他选择1)df.coalesce(xvalue).show()2)尝试设置执行程序内存没有用。

This issue is pending from past few weeks. 此问题在过去几周内仍未解决。 Can anyone please let me know where am I going wrong 谁能告诉我我要去哪里了

Could you please share the details regarding the datasets. 您能否分享有关数据集的详细信息。

  1. How many rows and columns in both datasets. 两个数据集中有多少行和列。

Have you tried leftOuterJoin, is it also giving you the same error. 您是否尝试过leftOuterJoin,是否还会给您同样的错误。

Regards, 问候,

Neeraj Neeraj

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM