Kafka Connect S3 Sink 添加元数据

Question

I am trying to add the metadata to the output from kafka into the S3 bucket.我正在尝试将元数据从 kafka 添加到 output 到 S3 存储桶中。

Currently, the output is just the values from the messages from the kafka topic.目前，output 只是来自 kafka 主题的消息的值。

I want to get it wrapped with the following (metadata): topic, timestamp, partition, offset, key, value example:我想用以下（元数据）包装它： topic, timestamp, partition, offset, key, value示例：

{
    "topic":"some-topic",
    "timestamp":"some-timestamp",
    "partition":"some-partition",
    "offset":"some-offset",
    "key":"some-key",
    "value":"the-orig-value"
}

note: when I am fetching it throw a consumer it fetched all the metadata.注意：当我获取它时，它会抛出一个消费者，它会获取所有元数据。 as I wished.如我所愿。

my connector configuration:我的连接器配置：

{  
 "name" : "test_s3_sink",   
 "config" : {     
     "connector.class" : "io.confluent.connect.s3.S3SinkConnector",
     "errors.log.enable" : "true",     
     "errors.log.include.messages" : "true",
     "flush.size" : "10000",
     "format.class" : "io.confluent.connect.s3.format.json.JsonFormat",
     "name" : "test_s3_sink",
     "rotate.interval.ms" : "60000",
     "s3.bucket.name" : "some-bucket-name",
     "storage.class" : "io.confluent.connect.s3.storage.S3Storage",
     "topics" : "some.topic",
     "topics.dir" : "some-dir"
   }
 }

Thanks.谢谢。

Answer 1

Currently, the output is just the values from the messages from the kafka topic.目前，output 只是来自 kafka 主题的消息的值。

Correct, this is the documented behavior.正确，这是记录在案的行为。 There's a setting for including the key data that you're missing, if you wanted that, as well, but no settings to get the rest of the data.有一个设置可以包含您丢失的关键数据，如果您想要的话，也可以，但是没有设置来获取数据的 rest。

For the record timestamp, you could edit your producer code to simply add that as part of your records.对于记录时间戳，您可以编辑生产者代码，将其添加为记录的一部分。 (and everything else, for that matter, if you're able to query for the next offset of the topic every time you produce) （以及其他一切，就此而言，如果您每次生成时都能查询主题的下一个偏移量）

For Topic and Partition, those are part of the S3 file, so whatever you're reading the files with should be able to parse out that information;对于 Topic 和 Partition，它们是 S3 文件的一部分，因此无论您使用什么读取文件，都应该能够解析出该信息； the offset value is also part of the filename, then add the line number within the file to get the (approximate) offset of the record.偏移值也是文件名的一部分，然后在文件中添加行号以获得记录的（近似）偏移量。

Or, you can use a Connect transform such as this archive one that relocates the Kafka record metadata (except offset and partition) all into the Connect Struct value such that the sink connector will then write all of it to the files或者，您可以使用 Connect 转换，例如这个存档转换，将 Kafka 记录元数据（偏移量和分区除外）全部重新定位到 Connect Struct 值中，这样 sink 连接器就会将其全部写入文件

https://github.com/jcustenborder/kafka-connect-transform-archive https://github.com/jcustenborder/kafka-connect-transform-archive

Either way, ConnectRecord has no offset field , a SinkRecord does, but I think that's too late in the API for transforms to access it无论哪种方式， ConnectRecord 都没有偏移字段， SinkRecord有，但我认为在 API 中转换访问它为时已晚

Kafka Connect S3 Sink 添加元数据

问题描述

1 个解决方案

解决方案1
0 2022-04-03 12:33:21

Kafka Connect S3 Sink 添加元数据

问题描述

1 个解决方案

解决方案1 0 2022-04-03 12:33:21

解决方案1
0 2022-04-03 12:33:21