简体   繁体   中英

Spark : Best way to Broadcast KafkaProducer to Spark streaming

To broadcast KafkaProducer to spark executors I have created a wrapper like below :

public class KafkaSink implements Serializable {
    private static KafkaProducer<String, String> producer = null;

    public KafkaProducer<String, String> getInstance(final Properties properties) {
        if(producer == null) {
            producer = new KafkaProducer<>(properties);
        }
        return producer;
    }

    public void close() {
        producer.close();
    }
}

and using it like below

 JavaSparkContext jsc = new JavaSparkContext(sc);
 Broadcast<KafkaSink> kafkaSinkBroadcast = jsc.broadcast(new KafkaSink()));
 dataset.toJavaRDD().foreach(row -> kafkaSinkBroadcast.getValue().getInstance(kafkaProducerProps()).send(new ProducerRecord<String, String>(topic, row.mkString(", "))))

I just wanted to know whether its the right way to do it, or what is the best way to do it

I can really recommend this blog post . In short, you should create a serializable sink for each partition by passing a 'recipe' to create Kafka producer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM