简体   繁体   English

kafka 消费者可以在从主题中轮询所有消息之前过滤消息吗?

[英]can a kafka consumer filter messages before polling all of them from a topic?

It was said that consumers can only read the whole topic.据说消费者只能阅读整个主题。 No luck doing evaluations on brokers to filter messages.对经纪人进行评估以过滤消息是没有运气的。

It implies that we have to consume/receive all messages from a topic and filter them on the client side.这意味着我们必须使用/接收来自主题的所有消息并在客户端过滤它们。

That's too much.这太多了。 I was wondering if we can filter and receive specific types of messages, based on somethings already passed to brokers, such as the msg keys or other things.我想知道我们是否可以根据已经传递给代理的内容(例如消息键或其他内容)来过滤和接收特定类型的消息。

from the method, Consumer.poll(timeout), it seems no extra things we can do.从方法Consumer.poll(timeout)来看,我们似乎没有额外的事情可以做。

No, with the Consumer you cannot only receive some messages from topics. 不,对于消费者,您不仅可以从主题接收一些消息。 The consumer fetches all messages in order. 消费者按顺序提取所有消息。

If you don't want to filter messages in the Consumer, you could use a Streams job. 如果您不想过滤消费者中的消息,则可以使用Streams作业。 For example, Streams would read from your topic and only push to another topic the messages the consumer is interested in. Then the consumer can subscribe to this new topic. 例如,Streams将从您的主题中读取并仅将消费者感兴趣的消息推送到另一个主题。然后,消费者可以订阅此新主题。

Once records are already pushed into Kafka cluster, there is not much that you can do. 一旦记录被推入Kafka集群,您就无法做到。 Whatever you want to filter, you will always have to bring the chunks of data to the client. 无论您想要过滤什么,您都必须将数据块带到客户端。

Unfortunately, the only option is to pass that logic to the Producers, in that way you can push the data into multiple topics based on particular logic you can define. 不幸的是,唯一的选择是将该逻辑传递给Producers,这样您就可以根据您可以定义的特定逻辑将数据推送到多个主题中。

Each Kafka topic should contain messages that are logically similar, just to stay on topic. 每个Kafka主题都应该包含逻辑相似的消息,只是为了保持主题。 Now, sometimes it might happen that you have a topic, let's say fruits , which contains different attributes of the fruit (maybe in json format). 现在,有时可能会发生一个主题,比如水果 ,它包含水果的不同属性(可能是json格式)。 You may have different fruits messages pushed by the producers, but want one of your consumer group to process only apples. 您可能会有生产者推送的不同水果消息,但希望您的一个消费者群体只处理苹果。 Ideally you might have gone with topic names with individual fruit name, but let's assume that to be a fruitless endeavor for some reason (maybe too many topics). 理想情况下,您可能已经使用了具有单独水果名称的主题名称,但是我们假设由于某种原因(可能是太多主题)是无效的尝试。 In that case, you can override the default partitioning scheme in Kafka to ignore the key and do a random partitioning, and then pass your custom-partitioner class through the partitioner.class property in the producer, that puts the fruit name in the msg key. 在这种情况下,您可以覆盖Kafka中的默认分区方案以忽略密钥并执行随机分区,然后通过生产者中的partitioner.class属性传递您的自定义分区程序类,将水果名称放在msg密钥中。 This is required because by default if you put the key while sending a message, it will always go to the same partition, and that might cause partition imbalance. 这是必需的,因为默认情况下,如果在发送消息时放置密钥,它将始终转到同一分区,这可能会导致分区不平衡。

The idea behind this is sometimes if your Kafka msg value is a complex object (json, avro-record etc) it might be quicker to filter the record based on key, than parsing the whole value, and extracting the desired field. 这背后的想法有时候,如果你的Kafka msg值是一个复杂的对象(json,avro-record等),它可能比基于键过滤记录更快,而不是解析整个值,并提取所需的字段。 I don't have any data right now, to support the performance benefit of this approach though. 我现在没有任何数据,以支持这种方法的性能优势。 It's only an intuition. 这只是一种直觉。

Kafka Consumer will receive all messages from topic. Kafka Consumer 将接收来自主题的所有消息。 But if there is any custom message type (MyMessage) that only needs to be consumed then it can be filtered in Deserializer class. If the consumer gets two types of messages like String and MyMessage then String type messages will be ignored and MyMessage type messages will be processed.但是如果有任何自定义的消息类型(MyMessage)只需要被消费那么它可以在Deserializer class中过滤。如果消费者得到两种类型的消息像String和MyMessage那么String类型的消息将被忽略而MyMessage类型的消息将被处理。

public class MyMessageDeserializer implements Deserializer<MyMessage> {

@Override
public MyMessage deserialize(String topic, byte[] data) {
    try {
        if (data == null){
            logger.info("Null received at deserializing");
            return null;
        }
        return objectMapper.readValue(new String(data, "UTF-8"), MyMessage.class);
    } catch (Exception e) {
        logger.error("Deserialization exception: " + e.getMessage());
    }
    return null;
}
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 消费者不读取来自 Kafka 主题的消息(Akka Stream Kafka) - Consumer does not read messages from the Kafka topic (Akka Stream Kafka) 如何在春季卡夫卡消费者消费之前过滤卡夫卡消息 - How to filter Kafka messages before consumer consume in spring Kafka 有效过滤来自 kafka 消费者的 Json 消息 - filter Json messages from kafka consumer efficiently 如何配置从 SQS 获取消息并将它们移动到 Kafka 主题的 Kafka 连接器? - How can I configure a Kafka connector that takes messages from SQS and moves them to a Kafka topic? 从特定主题中检索 Kafka 消费者的最后 n 条消息 - Retrieve last n messages of Kafka consumer from a particular topic 无法从 Kafka 中的消费者向死信主题发送消息 - Unable to send messages to dead letter topic from consumer in Kafka 来自主题的 Kafka 消息在消费者重启后被重放 - Kafka messages from topic are being replayed after consumer restart Kafka 消费者 - 暂停从特定 kafka 主题分区轮询事件以将其用作延迟队列 - Kafka consumer- Pause polling of event from specific kafka topic partition to use it as delayed queue 读取来自 Kafka 主题的所有消息 - Reading all messages from a Kafka topic 如何从kafka主题获取所有消息并使用Java进行计数? - How to get all the messages from kafka topic and count them using java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM