[英]kafka connect - ExtractTopic transformation with hdfs sink connector throws NullPointerException
I am using confluent hdfs sink connector 5.0.0 with kafka 2.0.0 and I need to use ExtractTopic transformation ( https://docs.confluent.io/current/connect/transforms/extracttopic.html ). 我在kafka 2.0.0中使用融合的hdfs接收器连接器5.0.0,我需要使用ExtractTopic转换( https://docs.confluent.io/current/connect/transforms/extracttopic.html )。 My connector works fine but when I add this transformation I get NullPointerException, even on simple data sample with only 2 attributes.
我的连接器工作正常,但是当我添加此转换时,即使仅具有2个属性的简单数据样本,也会出现NullPointerException。
ERROR Task hive-table-test-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:482)
java.lang.NullPointerException
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:352)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:109)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:265)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is configuration of connector: 这是连接器的配置:
name=hive-table-test
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=hive_table_test
key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
value.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
schema.compatibility=BACKWARD
# HDFS configuration
# Use store.url instead of hdfs.url (deprecated) in later versions. Property store.url does not work, yet
hdfs.url=${env.HDFS_URL}
hadoop.conf.dir=/etc/hadoop/conf
hadoop.home=/opt/cloudera/parcels/CDH/lib/hadoop
topics.dir=${env.HDFS_TOPICS_DIR}
# Connector configuration
format.class=io.confluent.connect.hdfs.avro.AvroFormat
flush.size=100
rotate.interval.ms=60000
# Hive integration
hive.integration=true
hive.metastore.uris=${env.HIVE_METASTORE_URIS}
hive.conf.dir=/etc/hive/conf
hive.home=/opt/cloudera/parcels/CDH/lib/hive
hive.database=kafka_connect
# Transformations
transforms=InsertMetadata, ExtractTopic
transforms.InsertMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.InsertMetadata.partition.field=partition
transforms.InsertMetadata.offset.field=offset
transforms.ExtractTopic.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ExtractTopic.field=name
transforms.ExtractTopic.skip.missing.or.null=true
I am using schema registry, data is in avro format and I am sure the given attribute name
is not null. 我正在使用架构注册表,数据为avro格式,并且我确定给定的属性
name
不为null。 Any suggestions? 有什么建议么? What I need is basically to extract content of given field and use it as a topic name.
我基本上需要提取给定字段的内容并将其用作主题名称。
EDIT: 编辑:
It happens even on simple json like this in avro format: 即使在像avro格式这样的简单json上也会发生这种情况:
{
"attr": "tmp",
"name": "topic1"
}
Short answer is because, you change the name of the topic in your Transformation. 简短的答案是因为您在“转换”中更改了主题名称。
Hdfs Connector for each topic partition has separate TopicPartitionWriter
. 每个主题分区的Hdfs连接器都有单独的
TopicPartitionWriter
。 When SinkTask, that is responsible for processing messages is created in open(...)
method for each partition TopicPartitionWriter
is created. 当SinkTask时,在
open(...)
方法中为每个分区TopicPartitionWriter
创建了负责处理消息的对象。
When it processed SinkRecords, based on topic name and partition number it looks up for TopicPartitionWriter
and try to append record to its buffer. 当它处理SinkRecords时,会根据主题名称和分区号查找
TopicPartitionWriter
然后尝试将记录追加到其缓冲区中。 In your case it couldn't find any write for message. 在您的情况下,找不到任何写消息。 The topic name was changed by Transformation and for that pair (topic, partition) any
TopicPartitionWriter
was not created. 主题名称已通过Transformation更改,对于该对(主题,分区),未创建任何
TopicPartitionWriter
。
SinkRecords, that are passed to HdfsSinkTask::put(Collection<SinkRecord> records)
, have partitions and topic already set, so you don't have to apply any Transformations. 传递到
HdfsSinkTask::put(Collection<SinkRecord> records)
已经设置了分区和主题,因此您不必应用任何转换。
I think io.confluent.connect.transforms.ExtractTopic
should be rather used for SourceConnector
. 我认为
io.confluent.connect.transforms.ExtractTopic
应该用于SourceConnector
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.