简体繁体 English

与RabbitMq类似，Apache Kafka能否拥有强大的路由功能？

[英]Can we have strong routing capability with Apache Kafka similar to RabbitMq?

原文 2015-03-26 06:11:31 5 1 java/ routing/ rabbitmq/ amqp/ apache-kafka

We are trying to evaluate Kafka and replace Rabbit Mq in our software. 我们正在尝试评估Kafka并在我们的软件中替换Rabbit Mq。

We know the advantages of Kafka in terms of RabbitMq over Offline consumption, huge persistence , superb performance , low latency and high throughput. 我们知道Kafka在RabbitMq方面优于离线消费，巨大的持久性，卓越的性能，低延迟和高吞吐量。

But we need the capability the way RabbitMq has with topic exchange granular routing for heterogeneous consumption. 但我们需要像RabbitMq一样的功能，使用主题交换粒度路由进行异构消费。

To some extent we can achieve this by having more number of partition per broker in Kafka. 在某种程度上，我们可以通过在Kafka中为每个代理提供更多的分区来实现这一目标。 But it has it's own limitations such as overhead of topic metadata on znode , increase latency. 但它有自己的局限性，例如znode上主题元数据的开销，增加延迟。

Our use case is to filter data within partition. 我们的用例是过滤分区内的数据。 Assume you are getting 100 sensor data of similar type in one partition. 假设您在一个分区中获得了100个类似类型的传感器数据。 Can consumer have capability to select only few of the sensor data and ignore the rest. 消费者是否有能力仅选择少量传感器数据而忽略其余数据。

We can do the filtering/routing at the application(consumer) side but it's seems to be not reusable and additional overhead at each consumer side. 我们可以在应用程序（消费者）方面进行过滤/路由，但它似乎不是可重用的，并且在每个消费者方面都有额外的开销。

Is there any way Kafka can provide rich routing capability by having optimum number of partition? Kafka有没有办法通过最佳分区数提供丰富的路由功能？

Thanks, Ashish 谢谢，Ashish

1 个解决方案

Kafka's messaging model is a lot simpler model than RabbitMQ, and users are wise to use the few abstractions that it does provide as they were intended. Kafka的消息模型比RabbitMQ模型简单得多，用户明智地使用它提供的少量抽象。 Really, topics are the only level of routing that should ever be done in Kafka. 实际上，主题是在Kafka中应该完成的唯一路由级别。 Partitions serve only to scale, provide order (but only within the partition, which Is a notable issue for scalability if you have an order-dependent application), and facilitate concurrent consumers within a topic. 分区仅用于扩展，提供顺序（但仅限于分区内，如果您具有依赖于订单的应用程序，这是可扩展性的显着问题），并促进主题中的并发使用者。

The problem with doing routing at the level of partitions is that it's not scalable because partitions are the element of Kafka that provides scalability (at the messaging layer at least). 在分区级别进行路由的问题在于它不可伸缩，因为分区是提供可伸缩性的Kafka的元素（至少在消息传递层）。 Obviously, Kafka is not designed for granular routing. 显然，Kafka不是为粒度路由而设计的。 It's designed for persistent, reliable, scalable, pub/sub messaging. 它专为持久，可靠，可扩展的发布/订阅消息传递而设计。 Nor are partitions designed to scale across the cluster. 分区也不是为了在整个集群中扩展而设计的。 By their very nature, partitions are local to one or a few Kafka nodes (depending on the topic's replication factor), but Kafka spreads multiple partitions within a topic across the cluster. 就其本质而言，分区是一个或几个Kafka节点的本地（取决于主题的复制因子），但Kafka在群集中的主题内分布多个分区。 This means there is some potential for hot spotting if messages are favoring some particular partition instead of being evenly distributed across partitions in a topic (which is why the Kafka producer normally handles partitioning for you). 这意味着如果消息支持某个特定分区而不是在主题中的分区之间均匀分布，则存在一些热点定位的可能性（这就是Kafka生产者通常为您处理分区的原因）。

In terms of filtering on the client side, I think you're right: that feels like a lot of wasted resources to me, but maybe I just dislike wasted resources too much. 在客户端的过滤方面，我认为你是对的：对我来说感觉就像浪费了很多资源，但也许我只是不喜欢浪费资源。

In short, I think you may risk digging yourself into a hole if you try to think of Kafka's messaging abstractions in such complex terms. 简而言之，如果你试图用如此复杂的术语来思考Kafka的消息传递抽象，我认为你可能会冒险陷入困境。 Kafka is very much designed for and optimized to distribute load via partitions, so co-opting them for a different - even if vaguely similar - use case is certainly not ideal. Kafka非常适合并通过分区进行优化以分配负载，因此将它们用于不同的 - 即使是模糊相似的 - 用例肯定不是理想的。

I have a feeling you can manage your use case within the context of Kafka's features. 我有一种感觉，你可以在Kafka的功能环境中管理你的用例。 I find that the biggest challenge with complex routing schemes within Kafka's topic framework is preventing duplicate data within multiple topics, but once you understand how multiple applications can consume from different positions within the same topic that issue seems to disappear. 我发现Kafka主题框架中复杂路由方案面临的最大挑战是阻止多个主题中的重复数据，但是一旦了解了多个应用程序如何在同一主题中的不同位置消耗，问题似乎就会消失。 In this sense, it's important to think of Kafka more as a log than as a queue. 从这个意义上讲，将Kafka更多地视为日志而不是队列是很重要的。

On a side note, I think your concern with znodes required to manage partitions is unfounded. 另外，我认为您对管理分区所需的znodes的关注是没有根据的。 If you have enough topics and partitions to consume the memory of your ZooKeeper nodes (a ton) then you've likely already run into much bigger resource issues. 如果你有足够的主题和分区来消耗ZooKeeper节点的内存（一吨）那么你可能已经遇到了更大的资源问题。