简体繁体 English

Confluent Kafka Connect：以同步方式运行多个接收器连接器

[英]Confluent Kafka Connect : Run multiple sink connectors in synchronous way

原文 2018-08-17 04:23:19 0 2 apache-kafka/ apache-kafka-connect/ confluent

We are using Kafka connect S3 sink connector that connect to Kafka and load data to S3 buckets.Now I want to load data from S3 buckets to AWS Redshift using Copy command, for that I'm creating my own custom connector.Use case is I want to load data that created over S3 to Redshift in synchronous way, and then next time S3 connector should replace the existing file and again our custom connector load data to S3. 我们正在使用Kafka connect S3接收器连接器连接到Kafka并将数据加载到S3存储桶中，现在我想使用Copy命令将数据从S3存储桶加载到AWS Redshift，为此我创建了自己的自定义连接器。想要将通过S3创建的数据以同步方式加载到Redshift，然后下一次S3连接器应替换现有文件，然后我们的自定义连接器再次将数据加载到S3。 How can I do this using Confluent Kafka Connect,or my other better approach to do same task? 如何使用Confluent Kafka Connect或其他更好的方法来完成相同任务？ Thanks in advance ! 提前致谢！

2 个解决方案

If you want data to Redshift, you should probably just use the JDBC Sink Connector and download the Redshift JDBC Driver into the kafka-connect-jdbc directory. 如果要将数据转移到Redshift，则可能应该只使用JDBC Sink连接器，然后将Redshift JDBC驱动程序下载到kafka-connect-jdbc目录中。

Otherwise, rather than writing a Connector, you could use Lambda to trigger some type of S3 event notification to do some type of Redshift upload 否则，您可以使用Lambda触发某种类型的S3事件通知来执行某种类型的Redshift上传，而不是编写连接器

Alternatively, if you are simply looking to query S3 data, you could use Athena instead without dealing with any databases 另外，如果您只是想查询S3数据，则可以使用Athena而不处理任何数据库

But basically, Sink Connectors don't communicate between one another. 但基本上，接收器连接器之间无法相互通信。 They are independent tasks that are designed to initially consume from a topic and write to a destination, not necessarily trigger external, downstream systems. 它们是独立的任务，旨在最初从主题中使用并写入目的地，而不必触发外部下游系统。

You want to achieve synchronous behaviour from Kafka to redshift then S3 sink connector is not right option. 您想要实现从Kafka到redshift的同步行为，那么S3接收器连接器不是正确的选择。

If you are using S3 sink connector then first put the data into s3 and then externally run copy command to push to S3. 如果使用的是S3接收器连接器，则首先将数据放入s3，然后从外部运行copy命令将其推送到S3。 ( Copy command is extra overhead ) （复制命令是额外的开销）
No customize code or validation can happen before pushing to redshift. 进行红移之前，无法进行任何自定义代码或验证。
Redshift sink connector has come up with native jdbc library which is equivalent fast to S3 copy command. Redshift sink连接器附带了本机jdbc库，该库与S3复制命令等效。