简体   繁体   English

Kafka Connect S3 Sink 添加元数据

[英]Kafka Connect S3 Sink add MetaData

I am trying to add the metadata to the output from kafka into the S3 bucket.我正在尝试将元数据从 kafka 添加到 output 到 S3 存储桶中。

Currently, the output is just the values from the messages from the kafka topic.目前,output 只是来自 kafka 主题的消息的值。

I want to get it wrapped with the following (metadata): topic, timestamp, partition, offset, key, value example:我想用以下(元数据)包装它: topic, timestamp, partition, offset, key, value示例:

{
    "topic":"some-topic",
    "timestamp":"some-timestamp",
    "partition":"some-partition",
    "offset":"some-offset",
    "key":"some-key",
    "value":"the-orig-value"
}

note: when I am fetching it throw a consumer it fetched all the metadata.注意:当我获取它时,它会抛出一个消费者,它会获取所有元数据。 as I wished.如我所愿。

my connector configuration:我的连接器配置:

{  
 "name" : "test_s3_sink",   
 "config" : {     
     "connector.class" : "io.confluent.connect.s3.S3SinkConnector",
     "errors.log.enable" : "true",     
     "errors.log.include.messages" : "true",
     "flush.size" : "10000",
     "format.class" : "io.confluent.connect.s3.format.json.JsonFormat",
     "name" : "test_s3_sink",
     "rotate.interval.ms" : "60000",
     "s3.bucket.name" : "some-bucket-name",
     "storage.class" : "io.confluent.connect.s3.storage.S3Storage",
     "topics" : "some.topic",
     "topics.dir" : "some-dir"
   }
 }

Thanks.谢谢。

Currently, the output is just the values from the messages from the kafka topic.目前,output 只是来自 kafka 主题的消息的值。

Correct, this is the documented behavior.正确,这是记录在案的行为。 There's a setting for including the key data that you're missing, if you wanted that, as well, but no settings to get the rest of the data.有一个设置可以包含您丢失的关键数据,如果您想要的话,也可以,但是没有设置来获取数据的 rest。

For the record timestamp, you could edit your producer code to simply add that as part of your records.对于记录时间戳,您可以编辑生产者代码,将其添加为记录的一部分。 (and everything else, for that matter, if you're able to query for the next offset of the topic every time you produce) (以及其他一切,就此而言,如果您每次生成时都能查询主题的下一个偏移量)

For Topic and Partition, those are part of the S3 file, so whatever you're reading the files with should be able to parse out that information;对于 Topic 和 Partition,它们是 S3 文件的一部分,因此无论您使用什么读取文件,都应该能够解析出该信息; the offset value is also part of the filename, then add the line number within the file to get the (approximate) offset of the record.偏移值也是文件名的一部分,然后在文件中添加行号以获得记录的(近似)偏移量。


Or, you can use a Connect transform such as this archive one that relocates the Kafka record metadata (except offset and partition) all into the Connect Struct value such that the sink connector will then write all of it to the files或者,您可以使用 Connect 转换,例如这个存档转换,将 Kafka 记录元数据(偏移量和分区除外)全部重新定位到 Connect Struct 值中,这样 sink 连接器就会将其全部写入文件

https://github.com/jcustenborder/kafka-connect-transform-archive https://github.com/jcustenborder/kafka-connect-transform-archive


Either way, ConnectRecord has no offset field , a SinkRecord does, but I think that's too late in the API for transforms to access it无论哪种方式, ConnectRecord 都没有偏移字段SinkRecord有,但我认为在 API 中转换访问它为时已晚

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka 连接 s3 sink 多个分区 - Kafka connect s3 sink multiple partitions Kafka S3 Sink基本疑惑 - Kafka S3 Sink basic doubts Confluent 的 S3 Sink Connector for Kafka Connect 能否使用“topics.dir”将主题写入 S3 存储桶中的嵌套(不是顶级)文件夹? - Can Confluent's S3 Sink Connector for Kafka Connect write topics to a nested (not a top-level) folder in an S3 bucket using `topics.dir`? kafka s3 接收器连接器在获取 NULL 数据时崩溃 - kafka s3 sink connector crashed when It gets NULL data kafka s3 接收器连接器键和标头 s3 存储写入不起作用 - kafka s3 sink connector keys and headers s3 storage write not working 如何在没有 Confluent 的情况下使用 Kakfa Connect 从 Kafka 向 AWS S3 发送数据? - How to send data to AWS S3 from Kafka using Kakfa Connect without Confluent? Nifi 将流文件属性添加到 S3 Object (PutS3Object) 元数据 - Nifi add flow file attributes to S3 Object (PutS3Object) Metadata 使用 django-storages 时如何添加用户定义的 S3 元数据 - How do you add user-defined S3 metadata in when using django-storages S3 接收器连接器未在存储桶中创建密钥或 header 文件 - S3 Sink Connector not creating key or header files within bucket 从 S3 读取文件到 kafka 主题 - Reading files from S3 to kafka topic
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM