简体   繁体   English

Kafka 流和写入 state 存储

[英]Kafka Streams and writing to the state store

I am working on a Kafka Streams application built with Spring Cloud Stream.我正在开发一个使用 Spring Cloud Stream 构建的 Kafka Streams 应用程序。 In this application I need to:在这个应用程序中,我需要:

  1. Consume a continuous stream of messages that can be retrieved at a later time.使用可以在以后检索的连续 stream 消息。
  2. Persist a list of the message IDs matching some criteria.保留与某些条件匹配的消息 ID 列表。
  3. In a separate thread, run a scheduler which reads out the message IDs at a regular interval, retrieve the corresponding messages that match those IDs, and perform an action with those messages.在一个单独的线程中,运行一个调度程序,它会定期读取消息 ID,检索与这些 ID 匹配的相应消息,并对这些消息执行操作。
  4. Remove the processed message IDs from the list so that work is not duplicated.从列表中删除已处理的消息 ID,以免重复工作。

I have considered implementing this as follows:我考虑过如下实施:

  1. Consume the incoming stream of messages as a materialized KTable so that I can look up and retrieve messages by key at a later time.将传入的 stream 消息作为物化 KTable 使用,以便我以后可以通过键查找和检索消息。
  2. Materialize the list of message IDs in another state store.在另一个 state 存储中实现消息 ID 列表。
  3. Use Spring's scheduling mechanism to run a separate thread which reads from the state store via the InteractiveQueryService bean.使用 Spring 的调度机制运行一个单独的线程,该线程通过InteractiveQueryService bean 从 state 存储中读取。

The problem I hit is that the InteractiveQueryService provides read-only access to the state store, so I cannot remove entries in the other thread.我遇到的问题是InteractiveQueryService提供对 state 存储的只读访问,因此我无法删除其他线程中的条目。 I have decided not to use Kafka Stream's punctuate capability since the semantics are different;我决定不使用 Kafka Stream 的 punctuate 功能,因为语义不同; my scheduling thread must always run at a regular interval, irrespective of the processing of the inbound messages.我的调度线程必须始终定期运行,而不管入站消息的处理。

Another alternative might be to use the low-level Processor API, and pass a reference to the writable state store to my scheduler thread.另一种选择可能是使用低级处理器 API,并将对可写 state 存储的引用传递给我的调度程序线程。 I will need to synchronize on write operations.我需要同步写操作。 But I'm not sure if this is do-able or if there are other constraints when accessing the state store like this from a separate thread.但我不确定这是否可行,或者在从单独的线程访问 state 商店时是否有其他限制。

Any input or advice would be appreciated!任何意见或建议将不胜感激!

my scheduling thread must always run at a regular interval, irrespective of the processing of the inbound messages我的调度线程必须始终定期运行,无论入站消息的处理如何

Well, punctuation based on WALL_CLOCK_TIME does exactly what you discribed above.好吧,基于WALL_CLOCK_TIME的标点符号完全符合您上面的描述。

The problem I hit is that the InteractiveQueryService provides read-only access to the state store我遇到的问题是 InteractiveQueryService 提供对 state 存储的只读访问权限

Using the Processor API and Punctuation allows you to access the state stores within the init() with ProcessorContext#getStateStore() and remove entries from the stores in ProcessorContext#schedule() .使用处理器 API 和标点符号允许您使用ProcessorContext#getStateStore()访问init()中的 state 存储,并从ProcessorContext#schedule()中的存储中删除条目。 The advantage of this solution is, that the processor and punctuator run in the same thread and you don't need any synchronisation between them.此解决方案的优点是,处理器和标点符号在同一个线程中运行,您不需要它们之间的任何同步。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM