简体   繁体   English

在 Flink 运行器上通过 KafkaIO 消费事件时,Apache Beam Pipeline 失败

[英]Failing Apache Beam Pipeline when consuming events through KafkaIO on Flink runner

I have a beam pipeline with several stages that consumes data through a KafkaIO and the code looks like below,我有一个带有多个阶段的光束管道,它通过 KafkaIO 消耗数据,代码如下所示,

pipeline.apply("Read Data from Stream", StreamReader.read())
        .apply("Decode event and extract relevant fields", ParDo.of(new DecodeExtractFields()))
        .apply(...);

StreamReader.read() method implementation, StreamReader.read()方法实现,

public static KafkaIO.Read<String, String> read() {
    return KafkaIO.<String, String>read()
            .withBootstrapServers(Constants.BOOTSTRAP_SERVER)
            .withTopics(Constants.KAFKA_TOPICS)
            .withConsumerConfigUpdates(Constants.CONSUMER_PROPERTIES)
            .withKeyDeserializer(StringDeserializer.class)
            .withValueDeserializer(StringDeserializer.class)
  //Line-A  .withMaxReadTime(Duration.standardDays(10))
            .withLogAppendTime();
}

When running the pipeline on the Direct Runner , it runs without throwing any errors.Direct Runner上运行管道时,它运行时不会引发任何错误。 But in my case, I have to use the Flink Runner and when the pipeline runs on the Flink Runner , it throws the following error,但就我而言,我必须使用Flink Runner并且当管道在Flink Runner上运行时,它会引发以下错误,

Exception in thread "main" java.lang.RuntimeException: Error while translating UnboundedSource: org.apache.beam.sdk.io.kafka.KafkaUnboundedSource@14b31e37
    at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:250)
    at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$ReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:336)
    at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:161)
....
    at Main.main(Main.java:6)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @2c34f934
    at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
    at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
    at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178)
    at java.base/java.lang.reflect.Field.setAccessible(Field.java:172)
    at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:106)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @2c34f934

The error can be resolved by un-commenting the Line-A in the above StreamReader.read() method, but that method, withMaxReadTime(...) should not use other than for testing/demos as per the doc.可以通过取消注释上述StreamReader.read()方法中的Line-A来解决该错误,但该方法withMaxReadTime(...)不应使用,除非根据文档用于测试/演示。

The pipeline instantiation done like this,像这样完成的管道实例化,

PipelineOptions pipelineOptions = PipelineOptionsFactory.create();
pipelineOptions.setRunner(FlinkRunner.class);
Pipeline pLine = Pipeline.create(pipelineOptions);

Questions:问题:

  1. Why this error occurs?为什么会出现这个错误?
  2. How do i solve this problem?我该如何解决这个问题?

If possible, please provide some resources on this.如果可能,请提供一些相关资源。

The error appears to not be in Beam but in Flink's closure cleaner, which modifies private parts of user or SDK code.该错误似乎不在 Beam 中,而是在 Flink 的关闭清理器中,它修改了用户的私有部分或 SDK 代码。 This appears to be a known issue with recent version of Java and Flink.这似乎是最新版本的 Java 和 Flink 的已知问题。 See Error message on example flink job: Unable to make field private final byte[] java.lang.String.value accessible请参阅示例 flink 作业的错误消息:无法使字段私有最终字节 [] java.lang.String.value 可访问

Why does the commented line change things?为什么注释行会改变事情? Normally when reading from Kafka you read the stream in an unbounded read.通常,从 Kafka 读取时,您会在无限制读取中读取 stream。 When you specify withMaxReadTime this becomes a bounded read.当您指定withMaxReadTime时,这将成为有读取。 So the translation to the underlying Flink operators is different.所以对底层 Flink 操作符的转换是不同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Beam Pipeline KafkaIO - 手动提交偏移量 - Apache Beam Pipeline KafkaIO - Commit offeset manully Flink runner 上的 Beam:ClassNotFoundException:org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector - Beam on Flink runner: ClassNotFoundException: org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector 如何使用 KafkaIO 和 Apache 使用 Java 设置 AvroCoder - How to set AvroCoder with KafkaIO and Apache Beam with Java 将 JSON 字符串序列化为 AVRO Apache Beam KafkaIO - Serialize JSON String to AVRO Apache Beam KafkaIO 如何在 Apache Beam 中使用 KafkaIO 指定 kafka 代理 - how to specify kafka brokers with KafkaIO in Apache Beam Apache Beam KafkaIO - 写入多个主题 - Apache Beam KafkaIO - Write to Multiple Topics 如何为 java 的 apache 光束管道配置火花流道 - How to configure spark runner for apache beam pipeline for java 带有 flink 的 apache 光束中的 CEP - CEP in apache beam with flink org.apache.beam.sdk.util.UserCodeException 使用 Samza Runner 执行 Beam Pipeline - org.apache.beam.sdk.util.UserCodeException while executing Beam Pipeline using the Samza Runner 在 Dataflow 中运行 Apache Beam Pipeline 时出现 SSLHandshakeException - SSLHandshakeException when running Apache Beam Pipeline in Dataflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM