简体   繁体   English

Spark&hbase:java.io.IOException:对等重置连接

[英]Spark&hbase: java.io.IOException: Connection reset by peer

I would appreciate it if you could help me. 如果您能帮助我,我将不胜感激。

During implementation of spark streaming from kafka to hbase (code is attached) we have faced an issue “java.io.IOException: Connection reset by peer” (full log is attached). 在从kafka到hbase的Spark流实现的过程中(附加了代码),我们遇到了一个问题“ java.io.IOException:对等体重置连接”(附加了完整日志)。

This issue comes up if we work with hbase and dynamic allocation option is on in spark settings. 如果我们使用hbase并且在spark设置中启用了动态分配选项,则会出现此问题。 In case we write data in hdfs (hive table) instead of hbase or if dynamic allocation option is off there are no errors found. 如果我们在hdfs(配置单元表)而不是hbase中写入数据,或者如果动态分配选项关闭,则不会发现错误。

We have tried to change zookeeper connections, spark executor idle timeout, network timeout. 我们试图更改Zookeeper连接,触发执行程序空闲超时,网络超时。 We have tried to change shuffle block transfer service (NIO) but the error is still there. 我们试图更改随机播放块传输服务(NIO),但错误仍然存​​在。 If we set min/max executers (less then 80) amount for dynamic allocation there are no problems too. 如果我们为动态分配设置最小/最大执行器(少于80个)数量,也没有问题。

What may the problem be? 可能是什么问题? There are a lot of almost the same problems in Jira and stack overflow, but nothing helps. 在Jira和堆栈溢出中,存在很多几乎相同的问题,但是没有任何帮助。

Versions: 版本:

HBase 1.2.0-cdh5.14.0
Kafka  3.0.0-1.3.0.0.p0.40
SPARK2 2.2.0.cloudera2-1.cdh5.12.0.p0.232957
hbase-client/hbase-spark(org.apache.hbase) 1.2.0-cdh5.11.1

Spark settings: 火花设置:

--num-executors=80
--conf spark.sql.shuffle.partitions=200
--conf spark.driver.memory=32g
--conf spark.executor.memory=32g
--conf spark.executor.cores=4

Cluster: 1+8 nodes, 70 CPU, 755Gb RAM, x10 HDD, 集群:1 + 8节点,70 CPU,755Gb RAM,x10 HDD,

Log: 日志:

    18/04/09 13:51:56 INFO cluster.YarnClusterScheduler: Executor 717 on lang32.ca.sbrf.ru killed by driver.
18/04/09 13:51:56 INFO storage.BlockManagerMaster: Removed 717 successfully in removeExecutor
18/04/09 13:51:56 INFO spark.ExecutorAllocationManager: Existing executor 717 has been removed (new total is 26)
18/04/09 13:51:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 705.
18/04/09 13:51:56 INFO scheduler.DAGScheduler: Executor lost: 705 (epoch 45)
18/04/09 13:51:56 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 705 from BlockManagerMaster.
18/04/09 13:51:56 INFO cluster.YarnClusterScheduler: Executor 705 on lang32.ca.sbrf.ru killed by driver.
18/04/09 13:51:56 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(705, lang32.ca.sbrf.ru, 22805, None)
18/04/09 13:51:56 INFO spark.ExecutorAllocationManager: Existing executor 705 has been removed (new total is 25)
18/04/09 13:51:56 INFO storage.BlockManagerMaster: Removed 705 successfully in removeExecutor
18/04/09 13:51:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 716.
18/04/09 13:51:56 INFO scheduler.DAGScheduler: Executor lost: 716 (epoch 45)
18/04/09 13:51:56 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 716 from BlockManagerMaster.
18/04/09 13:51:56 INFO cluster.YarnClusterScheduler: Executor 716 on lang32.ca.sbrf.ru killed by driver.
18/04/09 13:51:56 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(716, lang32.ca.sbrf.ru, 28678, None)
18/04/09 13:51:56 INFO spark.ExecutorAllocationManager: Existing executor 716 has been removed (new total is 24)
18/04/09 13:51:56 INFO storage.BlockManagerMaster: Removed 716 successfully in removeExecutor
18/04/09 13:51:56 WARN server.TransportChannelHandler: Exception in connection from /10.116.173.65:57542
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:748)
18/04/09 13:51:56 ERROR client.TransportResponseHandler: Still have 1 requests outstanding when connection from /10.116.173.65:57542 is closed
18/04/09 13:51:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 548.

Please see my related answer here: What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark 请在此处查看我的相关答案: 收到TimeoutException的可能原因是什么:使用Spark时,期货在[n秒]之后超时

It also took me a while to understand why Cloudera is stating following: 我还花了一段时间来了解为什么Cloudera声明以下内容:

Dynamic allocation and Spark Streaming 动态分配和Spark流

If you are using Spark Streaming, Cloudera recommends that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications. 如果使用的是Spark Streaming,则Cloudera建议通过在运行流应用程序时将spark.dynamicAllocation.enabled设置为false来禁用动态分配。

Reference: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_dynamic_allocation_streaming 参考: https : //www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_dynamic_allocation_streaming

Try setting these two parameters. 尝试设置这两个参数。 Also try caching the Dataframe before writing to HBase. 还要在写入HBase之前尝试缓存 Dataframe

spark.network.timeout

spark.executor.heartbeatInterval

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 java.io.IOException:对等的memcached重置了连接? - java.io.IOException: Connection reset by peer memcached? Spark在Gzip中编码并发送到S3 - java.io.IOException:设备上没有剩余空间 - Spark encode in Gzip and send to S3 - java.io.IOException: No space left on device java.io.IOException:方案的无文件系统:hdfs - java.io.IOException: No FileSystem for scheme : hdfs 在EMR群集上运行Spark作业时发生异常“ java.io.IOException:所有数据节点均损坏” - Exceptions while running Spark job on EMR cluster “java.io.IOException: All datanodes are bad” java.io.IOException:无法在 127.0.0.1:9042 打开到 Cassandra 的本机连接 - java.io.IOException: Failed to open native connection to Cassandra at 127.0.0.1:9042 java.io.IOException:无法在 Hadoop 二进制文件中找到可执行文件 null\bin\winutils.exe。 火花 Eclipse 在 windows 7 - java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7 Hadoop:java.io.IOException:传递删除或放置 - Hadoop : java.io.IOException: Pass a Delete or a Put java.io.IOException:WebSocket方法必须是GET - java.io.IOException: WebSocket method must be a GET java.io.IOException:无法运行程序“ scala”:CreateProcess错误= 2, - java.io.IOException: Cannot run program “scala”: CreateProcess error=2, 运行sbt失败 - java.io.IOException:设备上没有剩余空间 - Running sbt fails - java.io.IOException: No space left on device
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM