簡體   English   中英

由 OptionalDataException 引起的 StormServerHandler Netty 錯誤后,Storm 拓撲停止發射

[英]Storm topology stops emitting after StormServerHandler Netty error caused by OptionalDataException

我們有一個使用 3 個節點和多個拓撲運行的風暴集群。 我們使用apache-storm-1.2.2java 1.8.0_162

目前我們遇到的問題是隨機拓撲在發生錯誤並且 Netty 服務器不可用后隨機停止發射。 這可能會在幾個小時或幾天后發生。

由於我們沒有改變風暴螺栓發射或執行數據的邏輯,我們目前不知道如何拋出這樣的錯誤。 同樣值得懷疑的是,為什么在這樣的錯誤之后整個拓撲停止工作。

似乎某些 HashMap 的反序列化存在問題。 但我們無法弄清楚這是怎么發生的。

這里是導致失敗的一名工人的錯誤:

2019-09-24 14:31:02.414 o.a.s.m.n.StormServerHandler Netty-server-localhost-6727-worker-2 [ERROR] server errors in handling the request
java.lang.RuntimeException: java.io.OptionalDataException
        at org.apache.storm.serialization.SerializableSerializer.read(SerializableSerializer.java:58) ~[storm-core-1.2.2.jar:1.2.2]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:689) ~[kryo-3.0.3.jar:?]
        at org.apache.storm.serialization.KryoValuesDeserializer.deserializeFrom(KryoValuesDeserializer.java:37) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.serialization.KryoTupleDeserializer.deserialize(KryoTupleDeserializer.java:50) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.DeserializingConnectionCallback.recv(DeserializingConnectionCallback.java:56) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.Server.enqueue(Server.java:134) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.Server.received(Server.java:255) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.messaging.netty.StormServerHandler.messageReceived(StormServerHandler.java:61) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [storm-core-1.2.2.jar:1.2.2]
        at org.apache.storm.shade.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [storm-core-1.2.2.jar:1.2.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
Caused by: java.io.OptionalDataException
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1587) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) ~[?:1.8.0_162]
        at java.util.HashMap.readObject(HashMap.java:1407) ~[?:1.8.0_162]
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_162]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_162]
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) ~[?:1.8.0_162]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) ~[?:1.8.0_162]
        at org.apache.storm.serialization.SerializableSerializer.read(SerializableSerializer.java:56) ~[storm-core-1.2.2.jar:1.2.2]
        ... 32 more

拋出此錯誤后,此特定拓撲的所有其他工作人員將停止工作,我們必須殺死並重新部署。

其他工作人員的日志要么顯示無用,要么如下:

2019-09-24 14:31:03.012 o.a.s.m.n.Client Thread-32-disruptor-worker-transfer-queue [ERROR] connection to Netty-Client-HOSTE_NAME/IP:PORT is unavailable
2019-09-24 14:31:03.053 o.a.s.m.n.Client client-worker-1 [WARN] Re-connection to HOSTE_NAME/IP:PORT was successful but 4 messages has been lost so far

根據您發布的內容,以下幾點可以幫助您縮小問題范圍:

關於您的拓撲空閑,您應該嘗試至少升級到 Storm 1.2.3。 您可能會受到https://issues.apache.org/jira/browse/STORM-3055的影響。

您的堆棧跟蹤顯示您正在對某些元組使用 Java 序列化。 我認為這里發生的事情是您的一個工作人員已經序列化了一個元組,該元組被發送給另一個工作人員,然后該元組的反序列化失敗。 請參閱https://docs.oracle.com/javase/7/docs/api/java/io/OptionalDataException.html

首先,我認為您應該嘗試關閉 Java 序列化。 它很慢,而且您可能無論如何都不希望它在生產拓撲中使用。 請參閱https://storm.apache.org/releases/1.2.3/Serialization.html

其次,下次您的拓撲掛起時(假設升級到 1.2.3 無法修復它),我會嘗試使用jstack為空閑工作人員獲取線程轉儲。 這應該告訴你為什么拓撲沒有做任何事情。

Finally, you could try temporarily logging the byte arrays at https://github.com/apache/storm/blob/5b01c78d053b2b8e2b7e6a4a8acd4c822924bbf9/storm-core/src/jvm/org/apache/storm/serialization/SerializableSerializer.java#L51 , maybe當您看到異常時,他們會告訴您出了什么問題。 您得到的異常表明字節數組不包含序列化 HashMap 的預期數據。 要添加此日志記錄,您需要編輯和構建 Storm,請參閱https://github.com/apache/storm/blob/master/DEVELOPER.md#building了解如何執行此操作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM