繁体   English   中英

Kafka-Streams:聚合(KTable,KTable)连接的结果

[英]Kafka-Streams: Aggregate over the result of a (KTable, KTable) join

我得到了三个主题。 每个主题都有密钥和有效负载。 我尝试加入前两个主题,聚合结果,最后将这个结果与第三个主题一起加入。 但它没有按预期工作。

让我通过一个简单的例子来说明这种情况:

Topic 1 "Company": 
- Key:1234, {"id":"1234", ...}
...

Topic 2 "Mapping":
- Key:5678, {"id":"5678", "company_id":"1234", "category_id":"9876}
- Key:5679, {"id":"5679", "company_id":"1234", "category_id":"9877}
...

Topic 3 "Categories":
- Key:9876, {"id":"9876", "name":"foo"}
- Key:9877, {"id":"9877", "name":"bar"}
...

我希望每个公司都有一个所有相关类别的列表。 我尝试将“映射”与“类别”结合起来,并在结果上聚合“名称”。 这失败了,抛出以下错误:

org.apache.kafka.streams.errors.StreamsException:无法初始化处理器 KTABLE-FK-JOIN-OUTPUT-0000000018

处理器 KTABLE-FK-JOIN-OUTPUT-0000000018 无权访问 StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019,因为该存储未连接到处理器。

我试过:

    var joined = mappedTable
                    .leftJoin(
                            categoriesTable,
                            mappedForeignKey -> String.valueOf(mappedForeignKey.getCategoryId()),
                            (mapping, categories) -> new CategoriesMapping(mapping.getCompanyId(), categories.getName()),
                            Materialized.with(Serdes.String(), mappedSerde)
                    )
                    .groupBy((key, mapping) -> new KeyValue<>(String.valueOf(mapping.getCompanyId()), mapping), Grouped.with(Serdes.String(), mappedSerde))
                    .aggregate(
                            // ...
                    );

(我跳过了连接表最终与“公司”表连接的部分)

聚合函数做这样的事情: [{mappedValue1},{mappedValue2}] 并且它在没有表连接的情况下工作。

有没有办法让这种连接聚合发生? 是否有可能有这样的输出:

key, value:{"id":..., ..., "name":[{foo},{bar}, ...]}

全栈跟踪:

Exception in thread "company_details-16eef466-408a-4271-94ec-adad071b4d24-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: failed to initialize processor KTABLE-FK-JOIN-OUTPUT-0000000018
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:97)
    at org.apache.kafka.streams.processor.internals.StreamTask.initTopology(StreamTask.java:608)
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeTopology(StreamTask.java:336)
    at org.apache.kafka.streams.processor.internals.AssignedTasks.transitionToRunning(AssignedTasks.java:118)
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.updateRestored(AssignedStreamsTasks.java:349)
    at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:390)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: Processor KTABLE-FK-JOIN-OUTPUT-0000000018 has no access to StateStore KTABLE-FK-JOIN-OUTPUT-STATE-STORE-0000000019 as the store is not connected to the processor. If you add stores manually via '.addStateStore()' make sure to connect the added store to the processor by providing the processor name to '.addStateStore()' or connect them via '.connectProcessorAndStateStores()'. DSL users need to provide the store name to '.process()', '.transform()', or '.transformValues()' to connect the store to the corresponding operator. If you do not add stores manually, please file a bug report at https://issues.apache.org/jira/projects/KAFKA.
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:104)
    at org.apache.kafka.streams.kstream.internals.KTableSource$KTableSourceProcessor.init(KTableSource.java:84)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:93)

and

java.lang.IllegalStateException: Expected postgres_company_categories-STATE-STORE-0000000000 to have been initialized
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:284) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:177) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.suspend(StreamTask.java:680) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:788) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:80) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.closeTask(AssignedStreamsTasks.java:36) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedTasks.shutdown(AssignedTasks.java:256) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.shutdown(AssignedStreamsTasks.java:534) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:292) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1115) ~[kafka-streams-2.4.0.jar:na]
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:683) ~[kafka-streams-2.4.0.jar:na]

您遇到的是一个错误: https : //issues.apache.org/jira/browse/KAFKA-9517

该错误已针对即将发布的 2.4.1 和 2.5.0 版本修复。

作为一种解决方法,您可以通过将Materialize.as("some-name")传递到leftJoin()显式实现连接结果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM