简体   繁体   English

使用kafka connect s3 sink中的密钥进行分区

[英]Partitioning with key in kafka connect s3 sink

Can we partition our output in s3 sink connector with key?我们可以用密钥在 s3 sink 连接器中对我们的 output 进行分区吗? How can we set in connector config to just hold latest 10 record of each key or just hold data of 10 minutes ago?我们如何在连接器配置中设置只保存每个键的最新 10 条记录或只保存 10 分钟前的数据? or partitioning with key and time period.或使用键和时间段进行分区。

You'd need to set store.kafka.keys=true for the S3 sink to store keys, by default, but those will be written to unique files separately from the value, and within whatever partitioner you've configured.默认情况下,您需要为 S3 接收器设置store.kafka.keys=true来存储密钥,但这些密钥将与值分开写入唯一文件,并且在您配置的任何分区器中。

Otherwise, the FieldPartitioner only uses the value of the record, Therefore, you'd need an SMT to move the record key into the value in order to partition on it.否则,FieldPartitioner 仅使用记录的值,因此,您需要 SMT 将记录键移动到值中以便对其进行分区。

Last I checked, there is still an open PR on Github for a Field and Time partitioner.最后我检查了一下,Github 上仍然有一个针对字段和时间分区器的开放 PR。


The S3 sink doesn't window/compact any data, it'll dump and store everything. S3 接收器不会窗口/压缩任何数据,它会转储和存储所有内容。 You'll need an external process such as a Lambda function to cleanup data over time您需要一个外部进程,例如 Lambda function 来随着时间的推移清理数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM