简体   繁体   English

如何使用其字段和基于时间的分区为 json 配置 kafka s3 接收器连接器?

[英]How to configure kafka s3 sink connector for json using its fields AND time based partitioning?

I have a json coming in like this:我有一个这样的json:

{
    "app" : "hw",
    "content" : "hello world",
    "time" : "2018-05-06 12:53:04"
}

I wish to push to S3 in the following file format:我希望以以下文件格式推送到 S3:

/upper-directory/$jsonfield1/$jsonfield2/$date/$HH /upper-directory/$jsonfield1/$jsonfield2/$date/$HH

I know I can achieve:我知道我可以实现:

/upper-directory/$date/$HH /上层目录/$date/$HH

with TimeBasedPartitioner and Topic.dir, but how do I put in the 2 json fields as well?使用 TimeBasedPartitioner 和 Topic.dir,但我该如何放入 2 个 json 字段?

You need to write your own Partitioner to achieve a combination of TimeBased and Field Partitioners需要自己编写 Partitioner 来实现 TimeBased 和 Field Partitioner 的组合

That means make a new Java project, look at the source code for a reference point , build a JAR out of the project, and then copy the jar into kafka-connect-storage-common on all servers running Kafka Connect, which is picked up by the S3 connector.这意味着新建一个Java项目, 查看源代码中的一个参考点,从项目中构建一个JAR,然后将这个jar复制到所有运行Kafka Connect的服务器上的kafka-connect-storage-common ,被拾取通过 S3 连接器。 After you've copy the JAR, you will need to reboot the Connect process.复制 JAR 后,您需要重新启动 Connect 进程。

Note: there's already a PR that is trying to add this - https://github.com/confluentinc/kafka-connect-storage-common/pull/73/files注意:已经有一个 PR 试图添加这个 - https://github.com/confluentinc/kafka-connect-storage-common/pull/73/files

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM