简体   繁体   English

Kafka Streams:如何写主题?

[英]Kafka Streams: how to write to a topic?

In Kafka Streams, whats the canonical way of producing/writing a stream? 在Kafka Streams中,生成/编写流的规范方式是什么? In Spark, there is the custom receiver which works as a long running adapter from an arbitrary data source. 在Spark中,有一个自定义接收器,它作为来自任意数据源的长时间运行的适配器。 What is the equivalent in Kafka Streams? Kafka Streams中的等价物是什么?

To be specific, I'm not asking how to do transforms from one topic to another. 具体来说,我不是问如何从一个主题转换到另一个主题。 The documentation is very clear on that. 文档非常明确。 I want to understand how to write my workers that will be doing the first write in a series of transforms into Kafka. 我想了解如何编写我的工作人员,这些工作人员将在Kafka的一系列转换中首次编写。

I expect to be able to do 我希望能够做到

builder1.<something>(<some intake worker like a spark reciver)
       .to(topic1)
       .start()

builder2.from(topic1)
        .transform(<some transformation function>)
        .to(topic2)
        .start()

But none of the existing documentation shows this? 但是现有的文档都没有显示出来吗? Am I missing something? 我错过了什么吗?

Depends on whether you are using the Kafka Streams DSL or Processor API: 取决于您使用的是Kafka Streams DSL还是处理器API:

  • Kafka Streams DSL You can use KStream#to() to materialize the KStream to a topic. Kafka Streams DSL您可以使用KStream#to()KStream主题。 This is the canonical way to materialize data to a topic. 这是将数据实现到主题的规范方法。 Alternatively, you can use KStream#through() . 或者,您可以使用KStream#through() This will also materialize data to a topic, but also returns the resulting KStream for further use. 这也将实现主题的数据,但也返回生成的KStream以供进一步使用。 The only difference between #to() and #through() , then, is that it saves you a KStreamBuilder#stream() if you want the resulting materialized partition as a KStream . 然后, KStreamBuilder#stream() #to()#through()之间的唯一区别是,如果您希望将生成的物化分区作为KStream ,它将为您节省KStreamBuilder#stream()

  • Processor API You materialize data to a partition by forwarding the data to a sink processor. 处理器API通过将数据转发到接收器处理器,可以将数据实现到分区。

Either way, a crucial thing to note is that data is not materialized to a topic until you write to a partition using one of the methods mentioned above. 无论哪种方式,需要注意的一个重要事项是,在使用上述方法之一写入分区之前,数据不会实现主题。 map() , filter() , etc do not materialize data. map()filter()等不实现数据。 The data remains in the processor task/thread/memory until it is materialized by one of the methods above. 数据保留在处理器任务/线程/内存中,直到通过上述方法之一实现。


To produce into Kafka Streams: 要制作成Kafka Streams:

Properties producerConfig = new Properties();
producerConfig.put(BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:2181");
producerConfig.put(ACKS_CONFIG, "all");
producerConfig.put(RETRIES_CONFIG, 0);
Producer<Integer, Integer> producer = new KafkaProducer<>(producerConfig, new IntegerSerializer(), new IntegerSerializer<>());

and then: 接着:

Arrays.asList(1, 2, 3, 4).forEach(integer -> producer.send(new ProducerRecord<>("integers", integer, integer)))

You will need: 你会需要:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>${version.kafka}</version>
</dependency>

I want to understand how to write my workers that will be doing the first write in a series of transforms into kafka. 我想了解如何编写我的工作人员,这些工作人员将在第一次写入一系列转换为kafka。

The initial write (= input data) should not be done via Kafka Streams. 初始写入(=输入数据)不应通过Kafka Streams完成。 Kafka Streams assumes that the input data is already in Kafka. Kafka Streams假设输入数据已经在Kafka中。

So this expected workflow of yours is not applicable: 因此,您的预期工作流程不适用:

builder1.<something>(<some intake worker like a spark reciver)
   .to(topic1)
   .start()

Rather, you'd use something like Kafka Connect to get data into Kafka (eg from a database into a Kafka topic) or use the "normal" Kafka producer clients (Java, C/C++, Python, ...) to write the input data into Kafka. 相反,你会使用像Kafka Connect这样的东西来获取数据到Kafka(例如从数据库到Kafka主题)或使用“普通”Kafka生产者客户端(Java,C / C ++,Python,...)来编写将数据输入Kafka。

There's no "hook" yet available in Kafka Streams to bootstrap the input data. Kafka Streams中没有“钩子”可用于引导输入数据。 We're looking at a better integration of Kafka Connect and Kafka Streams, so this situation may well improve in the near future. 我们正在考虑更好地整合Kafka Connect和Kafka Streams,因此在不久的将来这种情况可能会有所改善。

You can try the following command for Linux: 您可以为Linux尝试以下命令:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topicName --property "parse.key=true"property "key.separator=:";
  1. parse.key when set to true allows accepting input as key and value pair from console. 当设置为true时,parse.key允许从控制台接受输入作为键和值对。

  2. key.separator is to be set to the separator character for key and value pair. key.separator将设置为键和值对的分隔符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM