简体   繁体   中英

filter Json messages from kafka consumer efficiently

I am reading log of Json objects from Kafka stream. Here is the format of each message:

{"class": "abc.cdf", "object":{....}}

I am interested in a particular "class" of messages which constitute only 10% of total messages received. How do I filter out the message based on this field efficiently without parsing the entire json for each new message?

Currently I am using ByteArraySerializer and ObjectMapper to parse to json pojo, then validate "class" field specifically. Sample code after for each batch of messages read from Kafka:

ObjectMapper mapper = new ObjectMapper();
for (record : records) {  
    MyRecord parsedRec = mapper.readValue(record, MyRecord.class);
    if (parsedRec == null || (!MYCLASSNAME.equals(parsedRec.getClass())))
       continue;
    ...
 }

Given the load of message stream, I want to spend minimal time filtering out uninterested messages.

One approach would be to parse only the class field and ignore the rest. So you use a new class RecordClass that contains only the class field and configure the mapper to don't fail on unknown properties (which is object ).

ObjectMapper mapper = new ObjectMapper()
      .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
RecordClass recordClass = mapper.readValue(record, RecordClass.class);

Then you parse the complete MyRecord only if RecordClass has the right class. Logically should be faster but in practice you need to test it.

Another approach is to send the messages with a specific class to another topic, so you basically filter on the producer side.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM