简体繁体 English

使用kafka connect s3 sink中的密钥进行分区

[英]Partitioning with key in kafka connect s3 sink

原文 2022-08-14 07:31:29 3 1 amazon-s3/ apache-kafka/ apache-kafka-connect/ s3-kafka-connector

Can we partition our output in s3 sink connector with key?我们可以用密钥在 s3 sink 连接器中对我们的 output 进行分区吗？ How can we set in connector config to just hold latest 10 record of each key or just hold data of 10 minutes ago?我们如何在连接器配置中设置只保存每个键的最新 10 条记录或只保存 10 分钟前的数据？ or partitioning with key and time period.或使用键和时间段进行分区。

1 个解决方案

You'd need to set store.kafka.keys=true for the S3 sink to store keys, by default, but those will be written to unique files separately from the value, and within whatever partitioner you've configured.默认情况下，您需要为 S3 接收器设置store.kafka.keys=true来存储密钥，但这些密钥将与值分开写入唯一文件，并且在您配置的任何分区器中。

Otherwise, the FieldPartitioner only uses the value of the record, Therefore, you'd need an SMT to move the record key into the value in order to partition on it.否则，FieldPartitioner 仅使用记录的值，因此，您需要 SMT 将记录键移动到值中以便对其进行分区。

Last I checked, there is still an open PR on Github for a Field and Time partitioner.最后我检查了一下，Github 上仍然有一个针对字段和时间分区器的开放 PR。

The S3 sink doesn't window/compact any data, it'll dump and store everything. S3 接收器不会窗口/压缩任何数据，它会转储和存储所有内容。 You'll need an external process such as a Lambda function to cleanup data over time您需要一个外部进程，例如 Lambda function 来随着时间的推移清理数据

Kafka 连接 s3 sink 多个分区 - Kafka connect s3 sink multiple partitions

Kafka Connect S3 Sink 添加元数据 - Kafka Connect S3 Sink add MetaData

Confluent Kafka-to-S3 sink 自定义 s3 命名，便于分区 - Confluent Kafka-to-S3 sink custom s3 naming for easy partitioning

如何正确重启kafka s3接收器连接？ - How to properly restart a kafka s3 sink connect?

正确配置Kafka Connect S3 Sink TimeBasedPartitioner - Properly Configuring Kafka Connect S3 Sink TimeBasedPartitioner

Kafka Connect S3 Sink Flush 数据 - 奇怪的滞后 - Kafka Connect S3 Sink Flush data - Strange lag

Kafka Connect S3 sink 连接器与自定义 Partitioner 奇怪行为 - Kafka Connect S3 sink connector with custom Partitioner strange behavior

如何使用其字段和基于时间的分区为 json 配置 kafka s3 接收器连接器？ - How to configure kafka s3 sink connector for json using its fields AND time based partitioning?

如何使用 Kafka 连接 s3 接收器连接器标记 S3 存储桶对象 - How to tag S3 bucket objects using Kafka connect s3 sink connector

使用 Kafka Connect S3 Sink 时从 S3 路径中删除主题名称 - Remove topic name from S3 paths when using Kafka Connect S3 Sink

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka 连接 s3 sink 多个分区 - Kafka connect s3 sink multiple partitions Kafka Connect S3 Sink 添加元数据 - Kafka Connect S3 Sink add MetaData Confluent Kafka-to-S3 sink 自定义 s3 命名，便于分区 - Confluent Kafka-to-S3 sink custom s3 naming for easy partitioning 如何正确重启kafka s3接收器连接？ - How to properly restart a kafka s3 sink connect? 正确配置Kafka Connect S3 Sink TimeBasedPartitioner - Properly Configuring Kafka Connect S3 Sink TimeBasedPartitioner Kafka Connect S3 Sink Flush 数据 - 奇怪的滞后 - Kafka Connect S3 Sink Flush data - Strange lag Kafka Connect S3 sink 连接器与自定义 Partitioner 奇怪行为 - Kafka Connect S3 sink connector with custom Partitioner strange behavior 如何使用其字段和基于时间的分区为 json 配置 kafka s3 接收器连接器？ - How to configure kafka s3 sink connector for json using its fields AND time based partitioning? 如何使用 Kafka 连接 s3 接收器连接器标记 S3 存储桶对象 - How to tag S3 bucket objects using Kafka connect s3 sink connector 使用 Kafka Connect S3 Sink 时从 S3 路径中删除主题名称 - Remove topic name from S3 paths when using Kafka Connect S3 Sink

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM