简体   繁体   English

Kafka Connect:多个独立连接器可以写入同一个 HDFS 目录吗?

[英]Kafka Connect: can multiple standalone connectors write to the same HDFS directory?

For our pipeline, we have about 40 topics (10-25 partitions each) that we want to write into the same HDFS directory using HDFS 3 Sink Connectors in standalone mode (distributed doesn't work for our current setup).对于我们的管道,我们有大约 40 个主题(每个 10-25 个分区),我们希望在独立模式下使用 HDFS 3 Sink 连接器将它们写入同一个 HDFS 目录(分布式不适用于我们当前的设置)。 We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted.我们已尝试在一个连接器上运行所有主题,但如果需要重新启动,则在恢复偏移量时遇到问题。

If we divide the topics among different standalone connectors, can they all write into the same HDFS directory?如果我们在不同的独立连接器之间划分主题,它们是否都可以写入同一个 HDFS 目录? Since the connectors then organize all files in HDFS by topic, I don't think this should be an issue but I'm wondering if anyone has experience with this setup.由于连接器然后按主题组织 HDFS 中的所有文件,我认为这不应该是一个问题,但我想知道是否有人对此设置有经验。

Basic example: Connector-1 config基本示例: Connector-1 配置

name=connect-1
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic1
hdfs.url=hdfs://kafkaOutput

Connector-2 config连接器 2 配置

name=connect-2
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic2
hdfs.url=hdfs://kafkaOutput

distributed doesn't work for our current setup分布式不适用于我们当前的设置

You should be able to run connect-distibured in the exact same nodes as connect-standalone is ran.您应该能够运行connect-distibured作为完全相同的节点connect-standalone为然。

We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted我们尝试在一个连接器上运行所有主题,但如果需要重新启动,则在恢复偏移量时遇到问题

Yeah, I would suggest not bundling all topics into one connector.是的,我建议不要将所有topics捆绑到一个连接器中。

If we divide the topics among different standalone connectors, can they all write into the same HDFS directory?如果我们在不同的独立连接器之间划分主题,它们是否都可以写入同一个 HDFS 目录?

That is my personal recommendation, and yes, they can because the HDFS path is named by the topic name, futher split by the partitioning scheme这是我个人的建议,是的,他们可以,因为 HDFS 路径以主题名称命名,并由分区方案进一步拆分


Note: The following allow applies to all other storage connectors (S3 & GCS)注意:以下允许适用于所有其他存储连接器(S3 和 GCS)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM