简体   繁体   English

Kafka Streams当地的国营商店

[英]Kafka Streams local state stores

I have a simple streams application takes one topic as input stream and transforms KeyValues to another like: 我有一个简单的流应用程序将一个主题作为输入流并将KeyValues转换为另一个,如:

StoreBuilder<KeyValueStore<Long, CategoryDto>> builder =
        Stores.keyValueStoreBuilder(Stores.inMemoryKeyValueStore(CategoryTransformer.STORE_NAME),
                                    Serdes.Long(), CATEGORY_JSON_SERDE);
    streamsBuilder.addStateStore(builder)
                         .stream(categoryTopic, Consumed.with(Serdes.Long(), CATEGORY_JSON_SERDE))
                         .transform(CategoryTransformer::new, CategoryTransformer.STORE_NAME);

static class CategoryTransformer implements Transformer<Long, CategoryDto, KeyValue<Long, CategoryDto>> {

    static final String STORE_NAME = "test-store";

    private KeyValueStore<Long, CategoryDto> store;

    @Override
    public void init(ProcessorContext context) {
      store = (KeyValueStore<Long, CategoryDto>) context.getStateStore(STORE_NAME);
    }

    @Override
    public KeyValue<Long, CategoryDto> transform(Long key, CategoryDto value) {
      store.put(key, value);
      return KeyValue.pair(key, value);
    }

    @Override
    public KeyValue<Long, CategoryDto> punctuate(long timestamp) {
      return null;
    }

    @Override
    public void close() {

    }
  }

Here i had to use transformer because i need to fetch store and update relevant value. 在这里我不得不使用变压器,因为我需要获取存储并更新相关值。

The question is what is the difference between using local state stores, and just putting values to a simple HashMap inside a ForeachAction ? 问题是使用本地状态存储和将值放在ForeachAction内的简单HashMap有什么ForeachAction

What is the advantage of using local state stores in this case? 在这种情况下使用本地国营商店有什么好处?

Although it is not shown in your code, I'm assuming you somehow read and use the stored state. 虽然它没有显示在您的代码中,但我假设您以某种方式读取并使用存储状态。

Storing your state using a simple (in memory) HashMap makes your state not persistent at all, this means your state will be lost when either of the following happens (those are nothing out of the ordinary, assume it will happen quite often): 使用简单(在内存中)存储状态HashMap会使您的状态完全不持久,这意味着当下列任何一种情况发生时,您的状态将会丢失(这些都不是特别的,假设它会经常发生):

  • your stream processor/applications stops, 您的流处理器/应用程序停止,
  • crashes, or 崩溃,或
  • is partially migrated elsewhere (other JVM) due to rebalancing. 由于重新平衡,部分迁移到其他地方(其他JVM)。

The problem with a non-persistent state is that when any of the above happens, kafka-streams will restart the processing at the last committed offset. 非持久状态的问题是,当发生上述任何一种情况时,kafka-streams将在最后提交的偏移处重新启动处理。 As such all records processed before the crash/stop/rebalance will not be reprocessed, this means the content of your HashMap will be empty when the processing restarts. 因此,在崩溃/停止/重新平衡之前处理的所有记录都不会被重新处理,这意味着当处理重新开始时, HashMap的内容将为空。 This is certainly not what you want. 这当然不是你想要的。

On the other hand, if you use one of the provided state stores, kafka-streams will ensure that, once the processing restarts after any of the interruptions listed above, the state will be available as if the processing never stopped, without reprocessing any of the previously processed records. 另一方面,如果您使用其中一个提供的状态存储,kafka-streams将确保一旦处理在上面列出的任何中断之后重新启动,状态将可用,就好像处理从未停止,而不重新处理任何以前处理过的记录。

The question is what is the difference between using local state stores, and just putting values to a simple HashMap inside a ForeachAction? 问题是使用本地状态存储和将值放在ForeachAction内的简单HashMap之间有什么区别?

If your input topics are not partitioned and you run a single instance of your Streams application, the value of the local state API is not huge. 如果您的输入主题未进行分区,并且您运行Streams应用程序的单个实例,则本地状态API的值不会很大。 In such cases—sure: you can use a HashMap in your processors, or some persistent HashMap if you wanted to survive restarts. 在这种情况下 - 确定:您可以在处理器中使用HashMap ,或者如果您想在重新启动后继续使用某些持久性HashMap

The value of local storage becomes clear when your topics are partitioned and clearer still when you run multiple instances of your Streams application. 当您的主题被分区时,本地存储的值会变得清晰,当您运行Streams应用程序的多个实例时,本地存储的值仍然更清晰。 In such cases, you need to maintain specific state with the processor that's processing a specific partition, and that state needs to be able to move with the processor in case it moves to a different Streams instance. 在这种情况下,您需要使用处理特定分区的处理器维护特定状态,并且该状态需要能够与处理器一起移动,以防它移动到不同的Streams实例。 In such cases—AKA scale—the local storage facility is both necessary and invaluable. 在这种情况下 - AKA规模 - 本地存储设施既有必要又非常宝贵。 Imagine having to orchestrate this yourself at scale, vs having this facility part of the core platform (the local state API). 想象一下,必须自己大规模地编排这个,而不是让这个工具成为核心平台(本地状态API)的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM