简体   繁体   中英

Kafka Connect: can multiple standalone connectors write to the same HDFS directory?

For our pipeline, we have about 40 topics (10-25 partitions each) that we want to write into the same HDFS directory using HDFS 3 Sink Connectors in standalone mode (distributed doesn't work for our current setup). We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted.

If we divide the topics among different standalone connectors, can they all write into the same HDFS directory? Since the connectors then organize all files in HDFS by topic, I don't think this should be an issue but I'm wondering if anyone has experience with this setup.

Basic example: Connector-1 config

name=connect-1
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic1
hdfs.url=hdfs://kafkaOutput

Connector-2 config

name=connect-2
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
topics=topic2
hdfs.url=hdfs://kafkaOutput

distributed doesn't work for our current setup

You should be able to run connect-distibured in the exact same nodes as connect-standalone is ran.

We have tried running all the topics on one connector but encounter problems recovering offsets if it needs to be restarted

Yeah, I would suggest not bundling all topics into one connector.

If we divide the topics among different standalone connectors, can they all write into the same HDFS directory?

That is my personal recommendation, and yes, they can because the HDFS path is named by the topic name, futher split by the partitioning scheme


Note: The following allow applies to all other storage connectors (S3 & GCS)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM