简体   繁体   English

Kafka流:无效的拓扑:尚未添加StateStore

[英]Kafka streams: Invalid topology: StateStore is not added yet

Our goal is to achieve the following architecture. 我们的目标是实现以下架构。 And most importantly is to be able to read all the data of topic T1 (from all partitions). 并且最重要的是能够读取主题T1的所有数据(从所有分区中)。

在此处输入图片说明

The problem we are facing is that we are not able to make the join between two nodes that are created from different builders (there is two different builders in every instance). 我们面临的问题是,我们无法在由不同构建器创建的两个节点之间建立联接(每个实例中都有两个不同的构建器)。 In every instance we created two builders (B1, B2). 在每种情况下,我们都创建了两个构建器(B1,B2)。 B1 creates a source processor that reads data from all partitions of T1 topic, so every instance has a unique ID. B1创建一个源处理器,该处理器从T1主题的所有分区读取数据,因此每个实例都有一个唯一的ID。 B2 reads data from one partition of one partition of T2. B2从T2的一个分区的一个分区中读取数据。 Later, when we do join, we get this error Invalid topology: StateStore aggregated-stream-store is not added yet cuz B1 and B2 have different APP_ID. 稍后,当我们加入时,我们会收到此错误无效拓扑:未添加StateStore聚合流存储,但是Cuz B1和B2具有不同的APP_ID。

This is our code: 这是我们的代码:

class StrmApp StrmApp类

public class StrmApp extends StrmProc {
    private StreamsBuilder myBuilder;
    private Validator<String, Data> dataValidator;
    private Properties ownBuilderProps;
    private KafkaStreams ownStreams;

    public StrmApp(ValidDataService dataService, ProcessConfig config, ProcessListener listener) {
        super(dataService, config, listener);
        myBuilder = new StreamsBuilder();
        dataValidator = getValidDataService().getValidator(String.class, Data.class);
        ownBuilderProps = new Properties();
        ownBuilderProps.putAll(getProperties());
        // Unique ID for each instance (different consumer group)
        ownBuilderProps.put(StreamsConfig.APPLICATION_ID_CONFIG, UUID.randomUUID());
    }

    private KTable<String, TheDataList> globalStream() {

        // KStream of records from T1 topic using String and TheDataSerde deserializers
        KStream<String, Data> trashStream = getOwnBuilder().stream("T1", Consumed.with(Serdes.String(), SerDes.TheDataSerde));

        // Apply an aggregation operation on the original KStream records using an intermediate representation of a KStream (KGroupedStream)
        KGroupedStream<String, Data> kGroupedStream = trashStream.groupByKey();

        // Describe how a StateStore should be materialized (as a KTable).
        // In our case we are using the default RocksDB back-ends by providing "vdp-aggregated-stream-store" as a state store name
        Materialized<String, TheDataList, KeyValueStore<Bytes, byte[]>> materialized = Materialized.as("aggregated-stream-store");
        materialized = materialized.withValueSerde(SerDes.TheDataListSerde);

        // Return a KTable
        return kGroupedStream.aggregate(() -> new TheDataList(), (key, value, aggregate) -> {
            if (!value.getValideData())
                aggregate.getList().removeIf((t) -> t.getTimestamp() <= value.getTimestamp());
            else
                aggregate.getList().add(value);
            return aggregate;
        }, materialized);
    }

    private Data tombstone(String Vid) {
        Data d = new Data();
        d.setVid(Vid);
        d.setValideData(false);
        d.setTimestamp(System.currentTimeMillis());
        return d;
    }

    @Override
    public void run() {
        /* read from topic 2 (T2) - we want to only read one partition */
        KStream<String, Data> inStream = getBuilder()
                .stream(getProcessConfig().getTopicConfig().getTopicIn(), Consumed.with(Serdes.String(), SerDes.TheDataSerde))
                .filter(getValidDataService().getValidator(String.class, Data.class));

        /* Read all partitions from topic 1 (T1) - we want to read from all partitions (P1, P2 and P3) */
        KTable<String, TheDataList> ft = globalStream();

        // ERROR: Invalid topology: StateStore vdp-aggregated-stream-store is not added yet.
        // When it comes to do the join it raises this error
        // I think because two builders have different APP_ID
        logger.warn("##JOIN:");
        /* join bteween data coming from T1 with data coming from T2 */
        KStream<String, TheDataList> validated = inStream.join(ft,
                new ValueJoiner<Data, TheDataList, TheDataList>() {
                    @Override
                    public TheDataList apply(Data valid, TheDataList ivalids) {
                        ivalids.getList().forEach((c) -> {
                            dataValidator.validate(c, valid);
                        });
                        return ivalids;
                    }
                });

        // ...... some code

        ownStreams = StreamTools.startKStreams(getOwnBuilder(), getOwnBuilderProps(), this, this);
        super.startStreams();
    }

    private Properties getOwnBuilderProps() {
        return ownBuilderProps;
    }

    private StreamsBuilder getOwnBuilder() {
        // return getBuilder();
        return myBuilder;
    }

    // .......
}

class StrmProc StrmProc类

public abstract class StrmProc extends AProcess {
    private final StreamsBuilder builder;

    public StrmProc(ValidDataService dataService, ProcessConfig config, ProcessListener listener) {
        super(dataService, config, listener);
        this.builder = new StreamsBuilder();
    }

    protected final StreamsBuilder getBuilder() {
        return builder;
    }

    protected final KafkaStreams startStreams() {
        return StreamTools.startKStreams(getBuilder(), getProperties(), this, this);

    }

    // ........

}

class AProcess AProcess类

public abstract class AProcess implements Process {
    private final Properties propertie;
    private final ProcessConfig config;
    private final ValidDataService dataService;
    private final ProcessListener listener;

    protected AProcess(ValidDataService dataService, ProcessConfig config, ProcessListener listener) {
        super();
        this.dataService = dataService;
        this.propertie = getProperties(config);
        this.config = config;
        this.listener = listener;
    }

    private Properties getProperties(ProcessConfig config) {
        Properties kafkaProperties = new Properties();
        kafkaProperties = new Properties();
        kafkaProperties.put(StreamsConfig.APPLICATION_ID_CONFIG, config.getApp());
        kafkaProperties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, config.getBootstrapServerUrl());
        kafkaProperties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        return kafkaProperties;
    }

    protected Properties getProperties() {
        return propertie;
    }

    protected ProcessConfig getProcessConfig() {
        return config;
    }

    protected ValidDataService getValidDataService() {
        return dataService;
    }

    // .......

}

Please, how to achieve this with Kafka streams? 请,如何使用Kafka流实现此目标?

in order to use join on Kafka Streams, you need to use a single StreamsBuilder instance and not two (in your case, two of them - variables inStream and ft ). 为了在Kafka Streams上使用join,您需要使用单个StreamsBuilder实例,而不是两个(在您的情况下,是两个实例-变量inStreamft )。

usually Kafka Streams throws exception TopologyException: Invalid topology: StateStore is not added yet if KeyValueStore is not added into StreamsBuilder instance: streamsBuilder.addStateStore(storeBuilder) . 通常,Kafka Streams会引发异常TopologyException: Invalid topology: StateStore is not added yet如果未将KeyValueStore添加到StreamsBuilder实例中,则尚未添加StreamsBuilderstreamsBuilder.addStateStore(storeBuilder)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka中的动态流拓扑 - Dynamic Streams Topology in Kafka 发送消息 Kafka Streams Topology - Drop a message Kafka Streams Topology Kafka Streams:StateStore 在从上下文中获取时是包含所有数据还是仅包含一些分区数据? - Kafka Streams: does StateStore contains all data or only some partitioned data when get from context? 为什么要使用 @Autowired 在 Spring Boot 应用程序中运行 Kafka Streams 拓扑? - Why to use @Autowired to run Kafka Streams topology in a Spring Boot Application? 如何以函数风格在 Spring Cloud Kafka Streams 中执行此拓扑? - How to do this topology in Spring Cloud Kafka Streams in function style? 如何使用 Quarkus 通过拓扑启动 Kafka-Streams 管道 - How to start a Kafka-Streams Pipeline by Topology using Quarkus 在Kafka Streams应用程序中,有没有办法使用输出主题的通配符列表定义拓扑? - In a Kafka Streams application, is there a way to define a topology with a wildcard list of output topics? Kafka Stream StateStore死循环 - Kafka Stream StateStore infinite loop Kafka 中的 stateStore.delete(key) 不起作用 - stateStore.delete(key) in Kafka is not working 更改Kafka-streams拓扑(添加重新分区步骤)是否对消息处理保证有任何影响 - Does changing the Kafka-streams topology( adding a repartition step) has any effect on message processing guarantee
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM