简体繁体 English

从火花流写入kafka的最简单方法是什么

[英]What is the most simple way to write to kafka from spark stream

原文 2016-07-10 04:53:51 5 1 apache-spark/ apache-kafka/ spark-streaming

I would like to write to kafka from spark stream data.我想从火花流数据写入 kafka。 I know that I can use KafkaUtils to read from kafka.我知道我可以使用KafkaUtils从 kafka 读取。 But, KafkaUtils doesn't provide API to write to kafka.但是，KafkaUtils 不提供 API 来写入 kafka。

I checked past question and sample code .我检查了过去的问题和示例代码。

Is Above sample code the most simple way to write to kafka?以上示例代码是写入 kafka 的最简单方法吗？ If I adopt way like above sample, I must create many classes...如果我采用上述示例的方式，我必须创建许多类...

Do you know more simple way or library to help to write to kafka?你知道更简单的方法或库来帮助写入 kafka 吗？

1 个解决方案

Have a look here :看看这里：

Basically this blog post summarise your possibilities which are written in different variations in the link you provided.基本上，这篇博文总结了您提供的链接中以不同变体形式编写的可能性。

If we will look at your task straight forward, we can make several assumptions:如果我们直接看你的任务，我们可以做几个假设：

Your output data is divided to several partitions, which may (and quite often will) reside on different machines您的输出数据被分成几个分区，这些分区可能（并且经常会）驻留在不同的机器上
You want to send the messages to Kafka using standard Kafka Producer API您想使用标准的 Kafka Producer API 将消息发送到 Kafka
You don't want to pass data between machines before the actual sending to Kafka您不想在实际发送到 Kafka 之前在机器之间传递数据

Given those assumptions your set of solution is pretty limited: You whether have to create a new Kafka producer for each partition and use it to send all the records of that partition, or you can wrap this logic in some sort of Factory / Sink but the essential operation will remain the same : You'll still request a producer object for each partition and use it to send the partition records.鉴于这些假设，您的解决方案集非常有限：您是否必须为每个分区创建一个新的 Kafka 生产者并使用它来发送该分区的所有记录，或者您可以将此逻辑包装在某种工厂/接收器中，但是基本操作将保持不变：您仍将为每个分区请求一个生产者对象并使用它来发送分区记录。

I'll suggest you continue with one of the examples in the provided link, the code is pretty short, and any library you'll find would most probably do the exact same thing behind the scenes.我建议您继续使用提供的链接中的示例之一，代码很短，您会找到的任何库很可能会在幕后做完全相同的事情。