简体   繁体   English

如何加入两个具有多个分区的Kafka流?

[英]How to join two Kafka streams, each having multiple paritions?

I have two Kafka streams, request and event each partitioned on a common field requestId (last two digits). 我有两个Kafka流,每个requestevent都划分在一个公共字段requestId(最后两位数字)上。 I want to join both the streams and write to HDFS or Local filesystem ? 我想同时加入两个流并写入HDFS或本地filesystem吗? How to write an efficient consumer which considers only the relevant partitions while joining the two streams ? 如何编写一个高效的consumer ,在连接两个streams时仅考虑相关partitions

You should use Kafka's Streams API , Apache Kafka's stream processing library, instead of a hand written consumer. 您应该使用Kafka的Streams API (Apache Kafka的流处理库),而不是手写的使用者。 To write the data to HDFS you should use Kafka Connect. 要将数据写入HDFS,应使用Kafka Connect。

For doing the join, look at this question: How to manage Kafka KStream to Kstream windowed join? 为了进行联接 ,请看以下问题: 如何管理Kafka KStream到Kstream窗口联接?

Also check out Confluent's documentation about Kafka Streams and Kafka Connect to get started. 另外,请查阅Confluent关于Kafka StreamsKafka Connect的文档以开始使用。 If you have further question, please start a follow up question (after reading the manual :)) 如果您还有其他疑问,请提出后续问题(阅读手册后:)

Kafka streams with Kafka Connect (for HDFS) is a straightforward solution. 带有Kafka Connect(用于HDFS)的Kafka流是一个简单的解决方案。 However, it must be pointed out that the HDFS connector for Kafka Connect is only available with the Confluent's version of Kafka. 但是,必须指出的是,Kafka Connect的HDFS连接器仅在Confluent版本的Kafka中可用。 The Apache Kafka Connect only comes with a file writer and not HDFS writer. Apache Kafka Connect仅随附一个文件编写器,而不提供HDFS编写器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM