[英]Spark streaming data from kafka topic and write into the text files in external path
I want to read a data from kafka topic and group by key values, and write into text files..我想从 kafka 主题中读取数据并按键值分组,然后写入文本文件..
public static void main(String[] args) throws Exception {
SparkSession spark=SparkSession
.builder()
.appName("Sparkconsumer")
.master("local[*]")
.getOrCreate();
SQLContext sqlContext = spark.sqlContext();
SparkContext context = spark.sparkContext();
Dataset<Row>lines=spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe","test-topic")
.load();
Dataset<Row> r= lines.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
r.printSchema();
r.createOrReplaceTempView("basicView");
sqlContext.sql("select * from basicView")
.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.writeStream()
.outputMode("append")
.format("console")
.option("path","usr//path")
.start()
.awaitTermination();
Following points are misleading in your code:以下几点在您的代码中具有误导性:
SparkContext
or SQLContext
,要从 Kafka 读取并写入文件,您不需要SparkContext
或SQLContext
,key
and value
twice into a string,您将key
和value
两次转换为字符串,format
of your output query should not be console if you want to store the data into a file.如果要将数据存储到文件中,则输出查询的format
不应为控制台。An example can be looked up in the Spark Structured Streaming + Kafka Integration Guide and the Spark Structured Streaming Programming Guide可以在Spark Structured Streaming + Kafka 集成指南和Spark Structured Streaming Programming Guide 中查找示例
public static void main(String[] args) throws Exception {
SparkSession spark = SparkSession
.builder()
.appName("Sparkconsumer")
.master("local[*]")
.getOrCreate();
Dataset<Row> lines = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe","test-topic")
.load();
Dataset<Row> r = lines
.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
// do some more processing such as 'groupBy'
;
r.writeStream
.format("parquet") // can be "orc", "json", "csv", etc.
.outputMode("append")
.option("path", "path/to/destination/dir")
.option("checkpointLocation", "/path/to/checkpoint/dir")
.start()
.awaitTermination();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.