简体   繁体   中英

Kafka-Streams: Aggregate over the result of a (KTable, KTable) join

I got three topics. Every topic has key and payload. I try to join the first two topics, aggregate the result and finally join this result together with the third topic. But it does not work as expected.

Let me illustrate the situation by providing a simple example:

Topic 1 "Company": 
- Key:1234, {"id":"1234", ...}
...

Topic 2 "Mapping":
- Key:5678, {"id":"5678", "company_id":"1234", "category_id":"9876}
- Key:5679, {"id":"5679", "company_id":"1234", "category_id":"9877}
...

Topic 3 "Categories":
- Key:9876, {"id":"9876", "name":"foo"}
- Key:9877, {"id":"9877", "name":"bar"}
...

I want every company to have a list of all associated categories. I tried joining "Mapping" with "Categories" and aggregate "name" over the result. This fails, throwing the following error:

org.apache.kafka.streams.errors.StreamsException: failed to initialize processor KTABLE-FK-JOIN-OUTPUT-0000000018

and

Processor KTABLE-FK-JOIN-OUTPUT-0000000018 has no access to StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019 as the store is not connected to the processor.

I tried:

    var joined = mappedTable
                    .leftJoin(
                            categoriesTable,
                            mappedForeignKey -> String.valueOf(mappedForeignKey.getCategoryId()),
                            (mapping, categories) -> new CategoriesMapping(mapping.getCompanyId(), categories.getName()),
                            Materialized.with(Serdes.String(), mappedSerde)
                    )
                    .groupBy((key, mapping) -> new KeyValue<>(String.valueOf(mapping.getCompanyId()), mapping), Grouped.with(Serdes.String(), mappedSerde))
                    .aggregate(
                            // ...
                    );

(I skipped the part, where the joined-table is finally joined with the "Company" table)

The aggregation function does something like this: [{mappedValue1},{mappedValue2}] and it works without the join on a table.

Is there a way to make this join-aggregation happen? And is it possible to have an output like this:

key, value:{"id":..., ..., "name":[{foo},{bar}, ...]}

Full Stack Trace:

Exception in thread "company_details-16eef466-408a-4271-94ec-adad071b4d24-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: failed to initialize processor KTABLE-FK-JOIN-OUTPUT-0000000018
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:97)
    at org.apache.kafka.streams.processor.internals.StreamTask.initTopology(StreamTask.java:608)
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeTopology(StreamTask.java:336)
    at org.apache.kafka.streams.processor.internals.AssignedTasks.transitionToRunning(AssignedTasks.java:118)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.updateRestored(AssignedStreamsTasks.java:349)
    at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:390)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: Processor KTABLE-FK-JOIN-OUTPUT-0000000018 has no access to StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019 as the store is not connected to the processor. If you add stores manually via '.addStateStore()' make sure to connect the added store to the processor by providing the processor name to '.addStateStore()' or connect them via '.connectProcessorAndStateStores()'. DSL users need to provide the store name to '.process()', '.transform()', or '.transformValues()' to connect the store to the corresponding operator. If you do not add stores manually, please file a bug report at https://issues.apache.org/jira/projects/KAFKA.
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:104)
    at org.apache.kafka.streams.kstream.internals.KTableSource$KTableSourceProcessor.init(KTableSource.java:84)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:93)

and

java.lang.IllegalStateException: Expected postgres_company_categories-STATE-STORE-0000000000 to have been initialized
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:284) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:177) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:680) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:788) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:80) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:36) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedTasks.shutdown(AssignedTasks.java:256) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.shutdown(AssignedStreamsTasks.java:534) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:292) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1115) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:683) ~[kafka-streams-2.4.0.jar:na]

What you encounter is a bug: https://issues.apache.org/jira/browse/KAFKA-9517

The bug is fixed for upcoming 2.4.1 and 2.5.0 releases.

As a workaround, you can materialize the join result explicitly, by passing Materialize.as("some-name") into leftJoin() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM