[英]RecordTooLargeException in Kafka streams join
I have a KStream x KStream join which is breaking down with the following exception. 我有一个KStream x KStream连接,它正在崩溃,但有以下异常。
Exception in thread “my-clicks-and-recs-join-streams-4c903fb1-5938-4919-9c56-2c8043b86986-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_15, processor=KSTREAM-SOURCE-0000000001, topic=my_outgoing_recs_prod, partition=15, offset=9248896
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:203)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:679)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:557)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_15] exception caught when producing
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:136)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:87)
at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
at org.apache.kafka.streams.state.internals.ChangeLoggingSegmentedBytesStore.put(ChangeLoggingSegmentedBytesStore.java:59)
at org.apache.kafka.streams.state.internals.MeteredSegmentedBytesStore.put(MeteredSegmentedBytesStore.java:105)
at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:107)
at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:100)
at org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:47)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:82)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:189)
... 3 more
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
I am joining a Click
topic with a Recommendation
topic. 我正在加入一个带有
Recommendation
主题的Click
主题。 The Click
objects are really small (less than a KB). Click
对象非常小(小于1 KB)。 Recommendation
, on the other hand, might be big, occasionally bigger than 1 MB. Recommendation
,在另一方面,可能是大的,偶尔会超过1 MB更大。
I Googled for the exception and found ( here ) that I need to set max.request.size
in the producer configs. 我用Google搜索异常并发现( 这里 )我需要在生产者配置中设置
max.request.size
。
What I don't understand is, where does the producer come into picture in the streams join? 我不明白的是,制作人在流媒体中加入了什么? The topic in the exception above
topic=my_outgoing_recs_prod
is the recommendations topic, and not the final joined topic. 上述例外
topic=my_outgoing_recs_prod
中的主题是推荐主题,而不是最终加入的主题。 Isn't the streaming application supposed to just "consume" from it? 流媒体应用程序是不应该只是“消费”它?
Nevertheless, I tried setting the property as config.put("max.request.size", "31457280");
不过,我尝试将属性设置为
config.put("max.request.size", "31457280");
, which is 30MB. ,这是30MB。 I don't expect the recommendations record to exceed that limit.
我不希望建议记录超过该限制。 Still, the code is crashing.
但是,代码崩溃了。
I cannot change the configs in the Kafka cluster but, if needed, I can change the properties of the relevant topics in Kafka. 我无法更改Kafka集群中的配置,但如果需要,我可以更改Kafka中相关主题的属性。
Could someone suggest what else I can try? 有人可以建议我还能尝试什么?
If nothing works, I am willing to ignore such oversizes messages. 如果没有任何作用,我愿意忽略这些超大消息。 However, I don't know a way of handling this
RecordTooLargeException
. 但是,我不知道处理这个
RecordTooLargeException
。
My code to perform the join is as follows. 我执行连接的代码如下。
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, JOINER_ID + "-" + System.getenv("HOSTNAME"));
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, booststrapList);
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.ByteArray().getClass().getName());
config.put("max.request.size", "314572800");
config.put("message.max.bytes", "314572800");
config.put("max.message.bytes", "314572800");
KStreamBuilder builder = new KStreamBuilder();
KStream<String, byte[]> clicksStream = builder.stream(TopologyBuilder.AutoOffsetReset.LATEST, Serdes.String(), Serdes.ByteArray(), clicksTopic);
KStream<String, byte[]> recsStream = builder.stream(TopologyBuilder.AutoOffsetReset.LATEST, Serdes.String(), Serdes.ByteArray(), recsTopic);
KStream<String, ClickRec> join = clicksStream.join(
recsStream,
(click, recs) -> new ClickRec(click, recs),
JoinWindows.of(windowMillis).until(3*windowMillis));
join.to(Serdes.String(), JoinSerdes.CLICK_SERDE, jointTopic);
KafkaStreams streams = new KafkaStreams(builder, config);
streams.cleanUp();
streams.start();
ClickRec
is the joined object (which is far smaller than a Recommendation
object and I don't expect it to be bigger than a few KBs). ClickRec
是连接对象(它远远小于一个Recommendation
对象,我不希望它大于几KB)。
Where do I put a try...catch
in the code above to recover from such occasionally oversized objects? 我在哪里
try...catch
上面的代码从这些偶尔超大的物体中恢复?
There are multiple configs at different levels: 不同级别有多个配置:
message.max.bytes
(default is 1000012) (cf http://kafka.apache.org/documentation/#brokerconfigs ) message.max.bytes
(默认是1000012)(参见http://kafka.apache.org/documentation/#brokerconfigs ) max.message.bytes
(default is 1000012) (cf http://kafka.apache.org/documentation/#topicconfigs ) max.message.bytes
(默认为1000012)(参见http://kafka.apache.org/documentation/#topicconfigs ) max.request.size
(default is 1048576) (cf. http://kafka.apache.org/documentation/#producerconfigs ) max.request.size
(默认为1048576)(参见http://kafka.apache.org/documentation/#producerconfigs ) You stack trace indicates, that you need to change the setting at the broker or topic level: 堆栈跟踪表明您需要更改代理或主题级别的设置:
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
引起:org.apache.kafka.common.errors.RecordTooLargeException:请求包含的消息大于服务器将接受的最大消息大小。
Maybe you also need to increase the producer setting. 也许你还需要增加生产者设置。
Why do you need this in the first place: 你为什么首先需要这个:
As you perform a KStream-KStream join, the join operator builds up state (it must buffer records from both streams in order to compute the join). 当您执行KStream-KStream连接时,连接运算符会建立状态(它必须缓冲来自两个流的记录才能计算连接)。 State is by default backed by a Kafka topic -- the local state is basically a cache while the Kafka topic is the source of truth.
默认情况下,状态由Kafka主题支持 - 本地状态基本上是缓存,而Kafka主题是事实的来源。 Thus, all your records will be written to this "changelog topic" that Kafka Streams creates automatically.
因此,所有记录都将写入Kafka Streams自动创建的“更改日志主题”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.