简体繁体 English

在Kafka Consumer App中像在状态存储中一样使用Kafka Streams

[英]Using Kafka Streams just as a state store in a Kafka Consumer App

原文 2019-02-20 22:35:39 8 1 java/ spring/ spring-boot/ apache-kafka/ apache-kafka-streams

I am currently working on a Spring boot application using Spring Kafka Consumer API. 我目前正在使用Spring Kafka Consumer API开发Spring Boot应用程序。

Each message I get on to a topic needs to be converted into a new object type with additional properties coming in from other topics. 我到达某个主题的每条消息都需要转换为新的对象类型，并具有来自其他主题的其他属性。 Currently, these other topics are not yet developed and we are using a mocked version of in-memory data for processing the requests. 目前，这些其他主题尚未开发，我们正在使用内存数据的模拟版本来处理请求。

For example, a new "shopping order" message arrives, but I am using mocked "Customer" object and mocked "item" object in order to process the order. 例如，一条新的“购物订单”消息到达，但是我正在使用模拟的“客户”对象和模拟的“项目”对象以处理订单。 The plan is to move to use real Customer topic and real Item topic. 计划是使用真实的客户主题和真实的项目主题。

Also, currently, the application is only Spring Kafka listeners for getting new order. 另外，当前，该应用程序仅是用于获得新订单的Spring Kafka侦听器。 The listeners invoke a spring bean method which processes the order and creates a new object to be written to another output topic named customer-order by using the same mocks I mentioned above. 侦听器调用一个spring bean方法，该方法处理订单并创建一个新对象，该对象使用与我上面提到的相同的模拟方法写入另一个名为customer-order的输出主题。

We are currently thinking about evolving the architecture for this application. 我们目前正在考虑改进此应用程序的体系结构。 I have been reading up on Kafka streams. 我一直在阅读卡夫卡流。 The documentation I have read online for streams only take simple examples such as word count, join etc. With my limited knowledge of streams, I don't envision using functionality such as calculating total etc. 我在网上阅读的关于流的文档仅采用简单的示例，例如字数统计，联接等。由于对流的了解有限，因此我不打算使用诸如计算总数之类的功能。

I have thought of some options for the architecture... 我已经想到了该体系结构的一些选择...

I was planning to retain the consumer API ie use Spring listeners implementation for receiving new order messages while using streams dependency just to create state stores that will eventually replace the mocked data. 我打算保留使用者API，即使用Spring侦听器实现来接收新的订单消息，同时使用流依赖性只是创建状态存储，该状态存储最终将替换模拟数据。 The idea is that the mocked data will eventually come from other topics. 想法是，模拟数据最终将来自其他主题。 So in this approach, the "streams" part of Kafka will be used only for creating state store and not for processing incoming records. 因此，在这种方法中，Kafka的“流”部分将仅用于创建状态存储，而不用于处理传入的记录。
Use purely Kafka consumers API and use API calls to fetch data external to my topic. 使用纯粹的Kafka使用者API并使用API调用来获取我主题外部的数据。 This is a less preferred option as I don't want to make an external API call for each new order. 这是一个不太理想的选项，因为我不想为每个新订单都进行外部API调用。
Use Kafka Streams for both reading new incoming orders and also for gathering and storing state. 使用Kafka Streams既可以读取新的传入订单，也可以收集和存储状态。 Also, make use of joins and merges so as to process data. 另外，利用联接和合并来处理数据。

What do you suggest? 你有什么建议？ 1,2, or 3? 1,2还是3？ Is that a good idea to use Streams for this kind of solution? 将Streams用于这种解决方案是一个好主意吗？ Is there any benefit in moving this implementation to use Kafka streaming? 将这种实现方式转换为使用Kafka流媒体有什么好处？ Or am I better of staying with 2.? 还是我最好还是和2呆在一起？

1 个解决方案

Number 1 sounds strange to me. 1对我来说听起来很奇怪。 You can keep a KafkaStreams application exposing state stores via Interactive Queries, but that would look more like a flavour of 2. You'll have to take into account also how you deploy your instances and ensure co-partitioning between the Spring part and the KafkaStreams part. 您可以保留一个KafkaStreams应用程序通过Interactive Queries公开状态存储，但是看起来更像是2。您还必须考虑如何部署实例并确保Spring部分和KafkaStreams之间的共分区。部分。

I don't see any problem in doing it in Kafka Streams fully, unless you have some very complex logic you cannot implement with current API, which I'd be surprised to learn you couldn't. 我完全不会在Kafka Streams中完成任何操作，除非您有一些非常复杂的逻辑，无法使用当前的API来实现，但如果您无法做到这一点，我会感到很惊讶。 Actually what you described sounds like a usual application for it (with the caveat of not knowing other requirements like time, expected volumes, etc). 实际上，您所描述的内容听起来像是它的常规应用程序（警告是不了解其他需求，例如时间，预期数量等）。

Benefits: 优点：

It creates an abstraction layer over consumption and production. 它在消费和生产之上创建了一个抽象层。 For example, something like the Order-Consumer enrichment sounds like a good use of it, by using the join as you mentioned. 例如，像订单，消费者富集听起来像一个好它，用你提到的加入。
Takes away complexity on deploying applications - it uses a partition assignment and rebalance scheme same as Kafka Brokers. 消除了部署应用程序的复杂性-它使用与Kafka Brokers相同的分区分配和重新平衡方案。 You can add/remove processing instances seamlessly. 您可以无缝添加/删除处理实例。
It is simpler than other stream processor libraries, but in most cases it's enough (and you also have Processor API - apart from DSL - if you need more DIY stuff. 它比其他流处理器库更简单，但是在大多数情况下就足够了（如果您需要更多DIY内容，则除了DSL之外，您还拥有Processor API。
Speed of development. 发展速度。 Once you have a basic knowledge of it (which is not that hard) you can begin writing applications quite quick because you focus on the logic. 一旦掌握了基础知识（并不难），您就可以很快地编写应用程序，因为您专注于逻辑。
Documentation is quite taken care of. 文档已得到充分照顾。

Cons: 缺点：

It is a JVM library, but it seems you're already using Java. 它是一个JVM库，但似乎您已经在使用Java。
Having to learn a new paradigm - though it is actually quite simple. 必须学习新的范例-尽管实际上很简单。 And quite similar and definitely simpler from other stream processing libraries. 并且与其他流处理库非常相似，而且绝对简单。
It is tied to (actually a part of) Kafka. 它与Kafka相关（实际上是其中一部分）。 If you're moving your infra away, you'll probably have to use a different stream processor. 如果要移开红外线，则可能必须使用其他流处理器。
Depending on your use case and especially, on its complexity, you may find other streaming platforms more beneficial (eg Spark or Flink just to name two). 根据您的用例，尤其是其复杂性，您可能会发现其他流媒体平台更有利（例如，Spark或Flink仅举两个例子）。
It is quite mature, but probably less that eg Spark. 它已经相当成熟，但可能不如Spark。 It is getting better, and you have the Confluent guys working on it, though. 它正在变得越来越好，不过您需要Confluent的人员在努力。