Java/Quarkus Kafka Streams Reading/Writing to Same Topic based on a condition

Question

Hello I have this issue that I'm trying to solve. Basically I have a Kafka Streams topology that will read JSON messages from a Kafka topic and that message gets deserialized into a POJO. Then ideally it will read check that message for a certain boolean flag. If that flag is true it will do some transformation and then write it back to the topic. However if the flag is false, I'm trying to have it not write anything but I'm not sure how I can go about it. With the MP Reactive Messaging I can just use an RxJava 2 Flowable Stream and return something like Flowable.empty() but I can't use that method here it seems.

JsonbSerde<FinancialMessage> financialMessageSerde = new JsonbSerde<>(FinancialMessage.class);

StreamsBuilder builder = new StreamsBuilder();

builder.stream(
        TOPIC_NAME,
        Consumed.with(Serdes.Integer(), financialMessageSerde)
    )
    .mapValues ( 
        message -> checkCondition(message)
    )
    .to (
        TOPIC_NAME,
        Produced.with(Serdes.Integer(), financialMessageSerde)
    );

The below is the function call logic.

public FinancialMessage checkCondition(FinancialMessage rawMessage) {
    FinancialMessage receivedMessage = rawMessage;

    if (receivedMessage.compliance_services) {
        receivedMessage.compliance_services = false;

        return receivedMessage;
    }

    else return null;
}

If the boolean is false it just returns a JSON body with "null".

I've tried changing the return type of the checkCondition function wrapped like

public Flowable<FinancialMessage> checkCondition (FinancialMessage rawMessage)

And then having the return from the if be like Flowable.just(receivedMessage) or Flowable.empty() but I can't seem to serialize the Flowable object. This might be a silly question but is there a better way to go about this?

Answer 1

Note that Kafka messages are immutable and not deleted after read, and if you read/write from the same topic with a single application, a message would be processed infinitely often (or to be more precise different copies of it) if you don't have a condition to "break" the cycle.

Also, if for example 5 services read from the same topic, all 5 services get a copy of every event. And if one service write back, all other 4 services and the writing service itself will read the message again. Thus, you get quite some data amplification.

If you have different services to react on the original input message consecutively, you could have one topic between each pair of consecutive services to really build a pipeline though.

Last, you say if the boolean flag is true you want to transform the message and emit (I assume for the next service to consumer). And for false you want to do nothing. I a further assume that for a message only a single flag will be true and a successful transformation also switches the flag (to enable processing by the next service). For this case, it's best if you can ensure that each original input message has the same initial boolean flag set to build your pipeline. Thus, only the corresponding service will read messages with its boolean flag set (you don't even need to check the boolean flag as your upstream write ensures that it's set; you could only have a sanity check).

If you don't know which boolean flag is set initially and all services read from the same input topic, just filtering out the message is correct. If all services read all messages, 4 services will filter the message while one service will process it and emit a new message with a different flag. For this architecture, a single topic might work: if a message is processed by all services and all boolean flags are false (after all services processed the message), and you write it back to the input topic, all services would drop the last copy correctly. However, using a single topic implies a lot of redundant reading/writing.

Maybe the best architecture is, to have your original input topic, and one additional input topic for each service. You also use an additional "dispatcher" service that read from the original input topics, and branches() the KStream into the service input topics according to the boolean flag. This way, each service will read only messages with the right flag set to true . Furthermore, each service will write to the input topic of the other services also using branch() after the message transformation to write it to the input topic of the correct next service. Last, you would want an output topic that each service can write into after a message is fully processed.

Java/Quarkus Kafka Streams Reading/Writing to Same Topic based on a condition

Question

1 answers

solution1
1 2020-08-05 03:55:20

Java/Quarkus Kafka Streams Reading/Writing to Same Topic based on a condition

Question

1 answers

solution1 1 2020-08-05 03:55:20

solution1
1 2020-08-05 03:55:20