简体   繁体   English

如何将Spark Streaming输出转换为数据帧或存储在表中

[英]How to convert spark streaming output into dataframe or storing in table

My code is: 我的代码是:

val lines = KafkaUtils.createStream(ssc, "localhost:2181", "spark-streaming-consumer-group", Map("hello" -> 5))
val data=lines.map(_._2)
data.print()

My output has 50 different values in a format as below 我的输出具有以下格式的50个不同值

{"id:st04","data:26-02-2018 20:30:40","temp:30", "press:20"}

Can anyone help me in storing this data in a table form as 谁能帮我将数据以表格形式存储为

| id |date               |temp|press|   
|st01|26-02-2018 20:30:40| 30 |20   |  
|st01|26-02-2018 20:30:45| 80 |70   |  

I will really appreciate. 我会很感激。

You can use foreachRDD function, together with normal Dataset API: 您可以将foreachRDD函数与普通的Dataset API结合使用:

data.foreachRDD(rdd => {
    // rdd is RDD[String]
    // foreachRDD is executed on the  driver, so you can use SparkSession here; spark is SparkSession, for Spark 1.x use SQLContext
    val df = spark.read.json(rdd); // or sqlContext.read.json(rdd)
    df.show(); 
    df.write.saveAsTable("here some unique table ID");
});

However, if you use Spark 2.x, I would suggest to use Structured Streaming: 但是,如果使用Spark 2.x,我建议使用结构化流:

val stream = spark.readStream.format("kafka").load()
val data = stream
            .selectExpr("cast(value as string) as value")
            .select(from_json(col("value"), schema))
data.writeStream.format("console").start();

You must manually specify schema, but it's quite simple :) Also import org.apache.spark.sql.functions._ before any processing 您必须手动指定架构,但这非常简单:)在进行任何处理之前,还要导入org.apache.spark.sql.functions._

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM