简体   繁体   English

在Kafka流上循环应用多个过滤器+写入多个主题

[英]Applying Multiple Filters + Write to Multiple Topics in a Loop on Kafka Streams

I have a requirement where I have a list of filters (where schema_field='val') and corresponding topics. 我有一个过滤器列表(其中schema_field ='val')和相应主题的要求。 I need to iterate over those list of filters and apply them, then write the filtered record value to its specific topic using KStreams. 我需要遍历那些过滤器列表并应用它们,然后使用KStreams将过滤后的记录值写入其特定主题。 Is there a functionality to do this? 有功能吗?

Example: 例:

synchronized (subscriberFilterRequirements) {
    Iterator<SubscriberFilterRequirements> itr = subscriberFilterRequirements.iterator();
    while (itr.hasNext()) {
        SubscriberFilterRequirements req = itr.next();
        log.info("*** Applying transformations on record");
        KStream<String, GenericRecord> subscriberFilteredRecord = filteredRecord;
        if (req.getPipelineSubscriptions().getFiltersql() != null && !req.getPipelineSubscriptions().getFiltersql().isEmpty()) {
            subscriberFilteredRecord = filteredRecord.filter((key, value) -> {
                String[] filter = req.getPipelineSubscriptions().getFiltersql().trim().split("=");
                return value.get(filter[0]).toString().equalsIgnoreCase(filter[1]);
            })
         }
        Schema schema = Utils.getAvroSchema(req.getPipelineSubscriptions().getSubscriberSchemaLocation(),
                    req.getPipelineSubscriptions().getSubscriberSchemaLocationType());
        GenericRecord sinkRecord = new GenericData.Record(schema);
        List<Schema.Field> schemaFieldsList = schema.getFields();
        Iterator<Schema.Field> sinkIterator = schemaFieldsList.iterator();
        subscriberFilteredRecord.map((key, value) -> {
            fillAvroRecord(sinkRecord, sinkIterator, value);
            return new KeyValue<>(key, sinkRecord);
        }).to(req.getPipelineSubscriptions().getKafkaTopic());
    }
}

Currently, what is happening is that, the loop's context and the KStream's context are not the same. 当前,正在发生的事情是,循环的上下文和KStream的上下文不相同。 When streaming is started, the loop executes fine the first time, ie, KStream receives the first filter and from then on, the KStream runs like an infinite loop without taking the second filter. 当开始流式传输时,循环第一次执行良好,即,KStream接收第一个过滤器,此后,KStream像无限循环一样运行,而无需使用第二个过滤器。 I want to inject the rest of the filters, one after another to be applied on the record. 我想注入其余的过滤器,一个接一个地应用到记录中。

Assume you have 3 filters predicates p1 , p2 , and p3 you can do: 假设您可以执行以下三个过滤谓词p1p2p3

KStream stream = ...
stream.filter(p1).to("output-1");
stream.filter(p2).to("output-2");
stream.filter(p3).to("output-3");

// or as a loop
Predicate[] predicate = new Predicate[]{p1,p2,p3};
String[] outputTopic = new String[]{"output-1","output-2","output-3"};
for(int i = 0; i < 3; ++i) {
    stream.filter(predicate[i]).to(outputTopic[i]);
}

This should also work via foreach() and a lambda expression, if you have a collection of predicate-outputTopic-pairs. 如果您有predicate-outputTopic-pairs的集合,这也应该通过foreach()和lambda表达式来工作。

I guess you need to use branch method on KStream with multiple predicates (filters) like the following: 我猜您需要在具有多个谓词(过滤器)的KStream上使用branch方法,如下所示:

Predicate<Object, String>[] branchingPredicates = ...;
KStream<Object, String>[] branchingStreams = kStream.branch(branchingPredicates);

for (int branchingIndex = 0; branchingIndex < branchingStreams.length; branchingIndex++) {
    branchingStreams[branchingIndex].map((k,v) -> { ... }).to(specificKafkaTopic);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM