简体   繁体   English

多个Kafka Stream与一个Stream消耗多个主题

[英]Multiple Kafka Stream vs One Stream consuming multiple topics

Which one of the following is best practice for Production environment: 以下哪一项是生产环境的最佳实践:

1: One stream consuming from multiple topics and writing to multiple topics. 1:一个流从多个主题消耗并写入多个主题。

2: Creating multiple streams (each with different app.id) for consuming from the different topic and writing to a different topic. 2:创建多个流(每个流都具有不同的app.id),以便从不同的主题进行消费并写入不同的主题。

I am not sure about 1st approach because when the amount of data in all topics will increase, won't consumer lag? 我不确定第一种方法,因为当所有主题中的数据量都会增加时,消费者会不会落后?

On what factor should I decide, which of the above approach is best suited for my scenario? 我应该决定什么因素,上述哪种方法最适合我的情况?

Update 1: I have 2 Topics. 更新1:我有2个主题。 1st topic with 1 partition(because I need to maintain ordering). 第一个主题具有1个分区(因为我需要保持排序)。 2nd topic with 6 partitions. 第二个主题有6个分区。

It depends very much on your use case scenario(eg what sort of business logic does the consumers, how are they being deployed: standalone apps, clusters, etc). 这在很大程度上取决于您的用例场景(例如,消费者使用哪种业务逻辑,如何部署它们:独立的应用程序,集群等)。 Your question is more on the architecture side. 您的问题更多是在体系结构方面。 Both solutions are viable, particularities are in your specific use case. 两种解决方案都是可行的,具体取决于您的特定用例。

If you semantically split your business logic into different stream I would suggest to go with the second option. 如果您在语义上将业务逻辑划分为不同的流,我建议您选择第二种方法。

Regarding the amount of data, keep in mind that most Kafka consumers should benefit from back pressure mechanism, so they will process how much they consume. 关于数据量,请记住,大多数Kafka消费者应从背压机制中受益,因此他们将处理自己消耗的电量。

I always suggest you go with option 2 because using option 2 we can also achieve fault tolerant, ie if one your application instance went down the stream partition handled by that instance will be distributed to the other running instances. 我总是建议您选择选项2,因为使用选项2我们还可以实现容错,即如果您的应用程序实例出现故障,则该实例处理的流分区将分配给其他正在运行的实例。 If you want to use the parallelism then you should use the same app.id for all the stream processing instances. 如果要使用并行性,则应对所有流处理实例使用相同的app.id。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM