简体   繁体   English

Kafka-Streams:聚合(KTable,KTable)连接的结果

[英]Kafka-Streams: Aggregate over the result of a (KTable, KTable) join

I got three topics.我得到了三个主题。 Every topic has key and payload.每个主题都有密钥和有效负载。 I try to join the first two topics, aggregate the result and finally join this result together with the third topic.我尝试加入前两个主题,聚合结果,最后将这个结果与第三个主题一起加入。 But it does not work as expected.但它没有按预期工作。

Let me illustrate the situation by providing a simple example:让我通过一个简单的例子来说明这种情况:

Topic 1 "Company": 
- Key:1234, {"id":"1234", ...}
...

Topic 2 "Mapping":
- Key:5678, {"id":"5678", "company_id":"1234", "category_id":"9876}
- Key:5679, {"id":"5679", "company_id":"1234", "category_id":"9877}
...

Topic 3 "Categories":
- Key:9876, {"id":"9876", "name":"foo"}
- Key:9877, {"id":"9877", "name":"bar"}
...

I want every company to have a list of all associated categories.我希望每个公司都有一个所有相关类别的列表。 I tried joining "Mapping" with "Categories" and aggregate "name" over the result.我尝试将“映射”与“类别”结合起来,并在结果上聚合“名称”。 This fails, throwing the following error:这失败了,抛出以下错误:

org.apache.kafka.streams.errors.StreamsException: failed to initialize processor KTABLE-FK-JOIN-OUTPUT-0000000018 org.apache.kafka.streams.errors.StreamsException:无法初始化处理器 KTABLE-FK-JOIN-OUTPUT-0000000018

and

Processor KTABLE-FK-JOIN-OUTPUT-0000000018 has no access to StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019 as the store is not connected to the processor.处理器 KTABLE-FK-JOIN-OUTPUT-0000000018 无权访问 StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019,因为该存储未连接到处理器。

I tried:我试过:

    var joined = mappedTable
                    .leftJoin(
                            categoriesTable,
                            mappedForeignKey -> String.valueOf(mappedForeignKey.getCategoryId()),
                            (mapping, categories) -> new CategoriesMapping(mapping.getCompanyId(), categories.getName()),
                            Materialized.with(Serdes.String(), mappedSerde)
                    )
                    .groupBy((key, mapping) -> new KeyValue<>(String.valueOf(mapping.getCompanyId()), mapping), Grouped.with(Serdes.String(), mappedSerde))
                    .aggregate(
                            // ...
                    );

(I skipped the part, where the joined-table is finally joined with the "Company" table) (我跳过了连接表最终与“公司”表连接的部分)

The aggregation function does something like this: [{mappedValue1},{mappedValue2}] and it works without the join on a table.聚合函数做这样的事情: [{mappedValue1},{mappedValue2}] 并且它在没有表连接的情况下工作。

Is there a way to make this join-aggregation happen?有没有办法让这种连接聚合发生? And is it possible to have an output like this:是否有可能有这样的输出:

key, value:{"id":..., ..., "name":[{foo},{bar}, ...]}

Full Stack Trace:全栈跟踪:

Exception in thread "company_details-16eef466-408a-4271-94ec-adad071b4d24-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: failed to initialize processor KTABLE-FK-JOIN-OUTPUT-0000000018
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:97)
    at org.apache.kafka.streams.processor.internals.StreamTask.initTopology(StreamTask.java:608)
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeTopology(StreamTask.java:336)
    at org.apache.kafka.streams.processor.internals.AssignedTasks.transitionToRunning(AssignedTasks.java:118)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.updateRestored(AssignedStreamsTasks.java:349)
    at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:390)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: Processor KTABLE-FK-JOIN-OUTPUT-0000000018 has no access to StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019 as the store is not connected to the processor. If you add stores manually via '.addStateStore()' make sure to connect the added store to the processor by providing the processor name to '.addStateStore()' or connect them via '.connectProcessorAndStateStores()'. DSL users need to provide the store name to '.process()', '.transform()', or '.transformValues()' to connect the store to the corresponding operator. If you do not add stores manually, please file a bug report at https://issues.apache.org/jira/projects/KAFKA.
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:104)
    at org.apache.kafka.streams.kstream.internals.KTableSource$KTableSourceProcessor.init(KTableSource.java:84)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:93)

and

java.lang.IllegalStateException: Expected postgres_company_categories-STATE-STORE-0000000000 to have been initialized
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:284) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:177) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:680) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:788) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:80) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:36) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedTasks.shutdown(AssignedTasks.java:256) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.shutdown(AssignedStreamsTasks.java:534) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:292) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1115) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:683) ~[kafka-streams-2.4.0.jar:na]

What you encounter is a bug: https://issues.apache.org/jira/browse/KAFKA-9517您遇到的是一个错误: https : //issues.apache.org/jira/browse/KAFKA-9517

The bug is fixed for upcoming 2.4.1 and 2.5.0 releases.该错误已针对即将发布的 2.4.1 和 2.5.0 版本修复。

As a workaround, you can materialize the join result explicitly, by passing Materialize.as("some-name") into leftJoin() .作为一种解决方法,您可以通过将Materialize.as("some-name")传递到leftJoin()显式实现连接结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM