简体   繁体   中英

Applying Multiple Filters + Write to Multiple Topics in a Loop on Kafka Streams

I have a requirement where I have a list of filters (where schema_field='val') and corresponding topics. I need to iterate over those list of filters and apply them, then write the filtered record value to its specific topic using KStreams. Is there a functionality to do this?

Example:

synchronized (subscriberFilterRequirements) {
    Iterator<SubscriberFilterRequirements> itr = subscriberFilterRequirements.iterator();
    while (itr.hasNext()) {
        SubscriberFilterRequirements req = itr.next();
        log.info("*** Applying transformations on record");
        KStream<String, GenericRecord> subscriberFilteredRecord = filteredRecord;
        if (req.getPipelineSubscriptions().getFiltersql() != null && !req.getPipelineSubscriptions().getFiltersql().isEmpty()) {
            subscriberFilteredRecord = filteredRecord.filter((key, value) -> {
                String[] filter = req.getPipelineSubscriptions().getFiltersql().trim().split("=");
                return value.get(filter[0]).toString().equalsIgnoreCase(filter[1]);
            })
         }
        Schema schema = Utils.getAvroSchema(req.getPipelineSubscriptions().getSubscriberSchemaLocation(),
                    req.getPipelineSubscriptions().getSubscriberSchemaLocationType());
        GenericRecord sinkRecord = new GenericData.Record(schema);
        List<Schema.Field> schemaFieldsList = schema.getFields();
        Iterator<Schema.Field> sinkIterator = schemaFieldsList.iterator();
        subscriberFilteredRecord.map((key, value) -> {
            fillAvroRecord(sinkRecord, sinkIterator, value);
            return new KeyValue<>(key, sinkRecord);
        }).to(req.getPipelineSubscriptions().getKafkaTopic());
    }
}

Currently, what is happening is that, the loop's context and the KStream's context are not the same. When streaming is started, the loop executes fine the first time, ie, KStream receives the first filter and from then on, the KStream runs like an infinite loop without taking the second filter. I want to inject the rest of the filters, one after another to be applied on the record.

Assume you have 3 filters predicates p1 , p2 , and p3 you can do:

KStream stream = ...
stream.filter(p1).to("output-1");
stream.filter(p2).to("output-2");
stream.filter(p3).to("output-3");

// or as a loop
Predicate[] predicate = new Predicate[]{p1,p2,p3};
String[] outputTopic = new String[]{"output-1","output-2","output-3"};
for(int i = 0; i < 3; ++i) {
    stream.filter(predicate[i]).to(outputTopic[i]);
}

This should also work via foreach() and a lambda expression, if you have a collection of predicate-outputTopic-pairs.

I guess you need to use branch method on KStream with multiple predicates (filters) like the following:

Predicate<Object, String>[] branchingPredicates = ...;
KStream<Object, String>[] branchingStreams = kStream.branch(branchingPredicates);

for (int branchingIndex = 0; branchingIndex < branchingStreams.length; branchingIndex++) {
    branchingStreams[branchingIndex].map((k,v) -> { ... }).to(specificKafkaTopic);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM