简体   繁体   English

Spark:将大型数据帧写入镶木地板文件时出现 LeaseExpiredException

[英]Spark: LeaseExpiredException while writing large dataframe to parquet files

I have a large dataframe which I am writing to parquet files in HDFS.我有一个大数据框,我正在写入 HDFS 中的镶木地板文件。 Getting the below exception from logs :从日志中获取以下异常:

2018-10-15 18:31:32 ERROR Executor:91 - Exception in task 41.0 in stage 0.0 (TID 1321)
org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:369)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /home/prod_out/20181007/_temporary/0/_temporary/attempt_20181015183108_0000_m_000041_0/part-00041-1185b10b-bcb1-4b7e-b732-dd6f71322b7d-c000.snappy.parquet (inode 33628528083): File does not exist. Holder DFSClient_NONMAPREDUCE_179567941_77 does not have any open files.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3481)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3284)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3122)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3082)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:822)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)

    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1455)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1251)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
2018-10-15 18:32:06 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 2189

Googled about it but couldn't find any concrete solution.谷歌搜索,但找不到任何具体的解决方案。 Set the speculation false: conf.set("spark.speculation","false")将推测设置为 false:conf.set("spark.speculation","false")
But still didn't help.但仍然没有帮助。 It's finishing few tasks, generating few part files and then abruptly stops with this error.它完成了几个任务,生成了几个零件文件,然后突然停止并出现此错误。

Details: Spark version : 2.3.1 (This was not happening in 1.6x).详细信息: Spark 版本:2.3.1(这在 1.6x 中没有发生)。
There is only one session running, which rules out the possibility of the same location being accessed by a different session.只有一个会话在运行,这就排除了不同会话访问同一位置的可能性。

Any pointers?任何指针?

Thanks!谢谢!

Actually the issue is because before spark writes the data into specified hdfs location, it uploads the data into temporary location.This two stage mechanism is the used to ensure consistency of the final data set when working with file systems.其实问题是因为s​​park在将数据写入指定的hdfs位置之前,它会将数据上传到临时位置。这两个阶段机制用于确保在使用文件系统时最终数据集的一致性。 In case of successful write the data is moved from temporary location.在成功写入的情况下,数据将从临时位置移动。 And in case of unsuccessful write the data is removed from the temporary location.如果写入不成功,数据将从临时位置中删除。 In your case there might be a different executor thread making changes to the temporary location.在您的情况下,可能会有不同的执行程序线程对临时位置进行更改。 And once the original executor thread looks to the temporary location, it is not available and hdfs lease exception is thrown.并且一旦原始执行器线程查找到临时位置,它将不可用并抛出 hdfs 租用异常。 In order to avoid this exception,为了避免这个异常,

  1. Make sure you are not using any parallel collections.确保您没有使用任何并行集合。
  2. Avoid multi-threading if applicable如果适用,避免多线程
  3. spark.conf.set("spark.speculation","false") spark.conf.set("spark.speculation","false")

It may be useful to you this solution: java.lang.OutOfMemoryError: Unable to acquire 100 bytes of memory, got 0这个解决方案可能对你有用: java.lang.OutOfMemoryError:无法获得100字节的内存,得到0

In my case I couldn't write orc files.就我而言,我无法编写 orc 文件。 I removed coalesce option and then it worked!我删除了合并选项,然后它起作用了!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM