[英]Failing Apache Beam Pipeline when consuming events through KafkaIO on Flink runner
I have a beam pipeline with several stages that consumes data through a KafkaIO and the code looks like below,我有一个带有多个阶段的光束管道,它通过 KafkaIO 消耗数据,代码如下所示,
pipeline.apply("Read Data from Stream", StreamReader.read())
.apply("Decode event and extract relevant fields", ParDo.of(new DecodeExtractFields()))
.apply(...);
StreamReader.read()
method implementation, StreamReader.read()
方法实现,
public static KafkaIO.Read<String, String> read() {
return KafkaIO.<String, String>read()
.withBootstrapServers(Constants.BOOTSTRAP_SERVER)
.withTopics(Constants.KAFKA_TOPICS)
.withConsumerConfigUpdates(Constants.CONSUMER_PROPERTIES)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
//Line-A .withMaxReadTime(Duration.standardDays(10))
.withLogAppendTime();
}
When running the pipeline on the Direct Runner , it runs without throwing any errors.在Direct Runner上运行管道时,它运行时不会引发任何错误。 But in my case, I have to use the Flink Runner and when the pipeline runs on the Flink Runner , it throws the following error,但就我而言,我必须使用Flink Runner并且当管道在Flink Runner上运行时,它会引发以下错误,
Exception in thread "main" java.lang.RuntimeException: Error while translating UnboundedSource: org.apache.beam.sdk.io.kafka.KafkaUnboundedSource@14b31e37
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:250)
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$ReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:336)
at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:161)
....
at Main.main(Main.java:6)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @2c34f934
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:172)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:106)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @2c34f934
The error can be resolved by un-commenting the Line-A in the above StreamReader.read()
method, but that method, withMaxReadTime(...)
should not use other than for testing/demos as per the doc.可以通过取消注释上述StreamReader.read()
方法中的Line-A来解决该错误,但该方法withMaxReadTime(...)
不应使用,除非根据文档用于测试/演示。
The pipeline instantiation done like this,像这样完成的管道实例化,
PipelineOptions pipelineOptions = PipelineOptionsFactory.create();
pipelineOptions.setRunner(FlinkRunner.class);
Pipeline pLine = Pipeline.create(pipelineOptions);
Questions:问题:
If possible, please provide some resources on this.如果可能,请提供一些相关资源。
The error appears to not be in Beam but in Flink's closure cleaner, which modifies private parts of user or SDK code.该错误似乎不在 Beam 中,而是在 Flink 的关闭清理器中,它修改了用户的私有部分或 SDK 代码。 This appears to be a known issue with recent version of Java and Flink.这似乎是最新版本的 Java 和 Flink 的已知问题。 See Error message on example flink job: Unable to make field private final byte[] java.lang.String.value accessible请参阅示例 flink 作业的错误消息:无法使字段私有最终字节 [] java.lang.String.value 可访问
Why does the commented line change things?为什么注释行会改变事情? Normally when reading from Kafka you read the stream in an unbounded read.通常,从 Kafka 读取时,您会在无限制读取中读取 stream。 When you specify withMaxReadTime
this becomes a bounded read.当您指定withMaxReadTime
时,这将成为有界读取。 So the translation to the underlying Flink operators is different.所以对底层 Flink 操作符的转换是不同的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.