简体   繁体   English

kafka connect-具有hdfs接收器连接器的ExtractTopic转换引发NullPointerException

[英]kafka connect - ExtractTopic transformation with hdfs sink connector throws NullPointerException

I am using confluent hdfs sink connector 5.0.0 with kafka 2.0.0 and I need to use ExtractTopic transformation ( https://docs.confluent.io/current/connect/transforms/extracttopic.html ). 我在kafka 2.0.0中使用融合的hdfs接收器连接器5.0.0,我需要使用ExtractTopic转换( https://docs.confluent.io/current/connect/transforms/extracttopic.html )。 My connector works fine but when I add this transformation I get NullPointerException, even on simple data sample with only 2 attributes. 我的连接器工作正常,但是当我添加此转换时,即使仅具有2个属性的简单数据样本,也会出现NullPointerException。

ERROR Task hive-table-test-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:482)
java.lang.NullPointerException
    at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:352)
    at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:109)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:464)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:265)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748) 

Here is configuration of connector: 这是连接器的配置:

name=hive-table-test
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=hive_table_test

key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
value.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
schema.compatibility=BACKWARD

# HDFS configuration
# Use store.url instead of hdfs.url (deprecated) in later versions. Property store.url does not work, yet
hdfs.url=${env.HDFS_URL}
hadoop.conf.dir=/etc/hadoop/conf
hadoop.home=/opt/cloudera/parcels/CDH/lib/hadoop
topics.dir=${env.HDFS_TOPICS_DIR}

# Connector configuration
format.class=io.confluent.connect.hdfs.avro.AvroFormat
flush.size=100
rotate.interval.ms=60000

# Hive integration
hive.integration=true
hive.metastore.uris=${env.HIVE_METASTORE_URIS}
hive.conf.dir=/etc/hive/conf
hive.home=/opt/cloudera/parcels/CDH/lib/hive
hive.database=kafka_connect

# Transformations
transforms=InsertMetadata, ExtractTopic
transforms.InsertMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.InsertMetadata.partition.field=partition
transforms.InsertMetadata.offset.field=offset

transforms.ExtractTopic.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ExtractTopic.field=name
transforms.ExtractTopic.skip.missing.or.null=true

I am using schema registry, data is in avro format and I am sure the given attribute name is not null. 我正在使用架构注册表,数据为avro格式,并且我确定给定的属性name不为null。 Any suggestions? 有什么建议么? What I need is basically to extract content of given field and use it as a topic name. 我基本上需要提取给定字段的内容并将其用作主题名称。

EDIT: 编辑:

It happens even on simple json like this in avro format: 即使在像avro格式这样的简单json上也会发生这种情况:

{
   "attr": "tmp",
   "name": "topic1"
}

Short answer is because, you change the name of the topic in your Transformation. 简短的答案是因为您在“转换”中更改了主题名称。

Hdfs Connector for each topic partition has separate TopicPartitionWriter . 每个主题分区的Hdfs连接器都有单独的TopicPartitionWriter When SinkTask, that is responsible for processing messages is created in open(...) method for each partition TopicPartitionWriter is created. 当SinkTask时,在open(...)方法中为每个分区TopicPartitionWriter创建了负责处理消息的对象。

When it processed SinkRecords, based on topic name and partition number it looks up for TopicPartitionWriter and try to append record to its buffer. 当它处理SinkRecords时,会根据主题名称和分区号查找TopicPartitionWriter然后尝试将记录追加到其缓冲区中。 In your case it couldn't find any write for message. 在您的情况下,找不到任何写消息。 The topic name was changed by Transformation and for that pair (topic, partition) any TopicPartitionWriter was not created. 主题名称已通过Transformation更改,对于该对(主题,分区),未创建任何TopicPartitionWriter

SinkRecords, that are passed to HdfsSinkTask::put(Collection<SinkRecord> records) , have partitions and topic already set, so you don't have to apply any Transformations. 传递到HdfsSinkTask::put(Collection<SinkRecord> records)已经设置了分区和主题,因此您不必应用任何转换。

I think io.confluent.connect.transforms.ExtractTopic should be rather used for SourceConnector . 我认为io.confluent.connect.transforms.ExtractTopic应该用于SourceConnector

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM