简体   繁体   English

Flink Kafka 源时间戳提取器的类加载

[英]Class loading for Flink Kafka Source Timestamp Extractor

I'm trying to deploy a Flink job to a cluster based on the flink:1.4.1-hadoop27-scala_2.11-alpine image.我正在尝试将 Flink 作业部署到基于flink:1.4.1-hadoop27-scala_2.11-alpine映像的集群。 The job is using a Kafka connector source (flink-connector-kafka-0.11) to which I'm trying to assign timestamps and watermarks.这项工作正在使用 Kafka 连接器源 (flink-connector-kafka-0.11),我正在尝试为其分配时间戳和水印。 My code is very similar to the Scala example in the Flink Kafka connector documentation .我的代码与Flink Kafka 连接器文档中的 Scala 示例非常相似。 But with FlinkKafkaConsumer011但是使用 FlinkKafkaConsumer011

val myConsumer = new FlinkKafkaConsumer08[String]("topic", new SimpleStringSchema(), properties)
myConsumer.assignTimestampsAndWatermarks(new CustomWatermarkEmitter())

This works great when running locally from my IDE.从我的 IDE 本地运行时,这非常有效。 However, in the cluster envrionment I get the following error:但是,在集群环境中,我收到以下错误:

java.lang.ClassNotFoundException: com.my.organization.CustomWatermarkEmitter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1863)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:521)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)

I'm building my job as a fat jar which I have verified contains this class.我正在将我的工作构建为一个胖罐子,我已经验证它包含这个类。 Does this example from the documentation only work if the CustomWatermarkEmitter class is in the /opt/flink/lib/ folder?文档中的这个示例是否仅在 CustomWatermarkEmitter 类位于 /opt/flink/lib/ 文件夹中时才有效?

This is the way I had to solve the issue.这就是我必须解决问题的方式。 But having to build this class separately and placing it in /opt/flink/lib complicates my build process significantly so I was wondering if this is the way it's supposed to be solved or if there are other ways around this problem?但是必须单独构建这个类并将其放置在 /opt/flink/lib 使我的构建过程显着复杂化,所以我想知道这是应该解决的方法还是有其他方法可以解决这个问题?

For example this section in the Flink documentation hints at having to provide some sources a UserCodeClassLoader manually?例如, Flink 文档中的这一部分暗示必须手动提供一些源 UserCodeClassLoader? Including the provided Kafka source?包括提供的 Kafka 源?

It seems to use a "userCodeClassLoader" internally as far as I could see in org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher:就我在 org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher 中看到的而言,它似乎在内部使用了“userCodeClassLoader”:

            case PERIODIC_WATERMARKS: {
            for (Map.Entry<KafkaTopicPartition, Long> partitionEntry : partitionsToInitialOffsets.entrySet()) {
                KPH kafkaHandle = createKafkaPartitionHandle(partitionEntry.getKey());

                AssignerWithPeriodicWatermarks<T> assignerInstance =
                        watermarksPeriodic.deserializeValue(userCodeClassLoader);

                KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> partitionState =
                        new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
                                partitionEntry.getKey(),
                                kafkaHandle,
                                assignerInstance);

                partitionState.setOffset(partitionEntry.getValue());

                partitionStates.add(partitionState);
            }

EDIT:编辑:

I have created a simple project where this issue can be reprodued here: https://github.com/lragnarsson/flink-kafka-classpath-problem我创建了一个简单的项目,可以在这里重现这个问题: https : //github.com/lragnarsson/flink-kafka-classpath-problem

In order to reproduce, you need docker and docker-compose.为了重现,您需要 docker 和 docker-compose。

just do:做就是了:

  1. git clone https://github.com/lragnarsson/flink-kafka-classpath-problem.git git 克隆https://github.com/lragnarsson/flink-kafka-classpath-problem.git
  2. cd flink-kafka-classpath-problem/docker cd flink-kafka-classpath-problem/docker
  3. docker-compose build docker-compose 构建
  4. docker-compose up码头工人组成
  5. Go to localhost:8081 in your browser在浏览器中访问 localhost:8081
  6. Submit included jar file from target/scala-2.11/flink-kafka-classpath-problem-assembly-0.1-SNAPSHOT.jar从 target/scala-2.11/flink-kafka-classpath-problem-assembly-0.1-SNAPSHOT.jar 提交包含的 jar 文件

This should lead to the exception java.lang.ClassNotFoundException: se.ragnarsson.lage.MyTimestampExtractor这应该会导致异常java.lang.ClassNotFoundException: se.ragnarsson.lage.MyTimestampExtractor

I think you've stumbled on a bug introduced in Flink 1.4.1: https://issues.apache.org/jira/browse/FLINK-8741 .我认为您偶然发现了 Flink 1.4.1 中引入的一个错误: https : //issues.apache.org/jira/browse/FLINK-8741

It will be fixed shortly in 1.4.2.它很快会在 1.4.2 中修复。 You can try to test in on the 1.4.2.rc2: https://github.com/apache/flink/tree/release-1.4.2-rc2您可以尝试在 1.4.2.rc2 上进行测试: https : //github.com/apache/flink/tree/release-1.4.2-rc2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM