简体繁体中英

Spark batch write to Kafka topic from multi-column DataFrame

原文 2018-11-23 14:36:54 8 1 apache-spark/ apache-kafka/ apache-spark-sql

After the batch, Spark ETL I need to write to Kafka topic the resulting DataFrame that contains multiple different columns.

According to the following Spark documentation https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html the Dataframe being written to Kafka should have the following mandatory column in schema:

value (required) string or binary

As I mentioned previously, I have much more columns with values so I have a question - how to properly send the whole DataFrame row as a single message to Kafka topic from my Spark application? Do I need to join all of the values from all columns into the new DataFrame with a single value column(that will contain the joined value) or there is more proper way to achieve it?

1 answers

The proper way to do that is already hinted by the docs, and doesn't really differ form what you'd do with any Kafka client - you have to serialize the payload before sending to Kafka.

How you you'll do that ( to_json , to_csv , Apache Avro ) depends on your business requirements - nobody can answers this but you (or your team).

Spark Dataframe write to kafka topic in avro format?

PySpark : Write Spark Dataframe to Kafka Topic

Read Data from kafka topic into spark dataframe

Spark Streaming - write to Kafka topic

Can't Read from and write to kafka topic using spark scala

Read from Kafka topic process the data and write back to Kafka topic using scala and spark

[Py]Spark SQL: Multi-column sessionization

Read Kafka topic in a Spark batch job

Reading from Kafka topic using Spark Structured Streaming: Can multi-line JSON published to Kafka topic be parsed by Spark?

Reading kafka topic using spark dataframe

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark Dataframe write to kafka topic in avro format? PySpark : Write Spark Dataframe to Kafka Topic Read Data from kafka topic into spark dataframe Spark Streaming - write to Kafka topic Can't Read from and write to kafka topic using spark scala Read from Kafka topic process the data and write back to Kafka topic using scala and spark [Py]Spark SQL: Multi-column sessionization Read Kafka topic in a Spark batch job Reading from Kafka topic using Spark Structured Streaming: Can multi-line JSON published to Kafka topic be parsed by Spark? Reading kafka topic using spark dataframe

Related Tags

Spark batch write to Kafka topic from multi-column DataFrame

Question

1 answers

solution1 4 ACCPTED 2018-11-23 14:43:18

solution1
4 ACCPTED 2018-11-23 14:43:18