简体   繁体   中英

Structured streaming debugging input

Is there a way for me to print out the incoming data? For eg I have a readStream on a folder looking for JSON files, however there seems to be an issue as I am seeing 'nulls' in the aggregation output.

val schema = StructType(
      StructField("id", LongType, false) ::
      StructField("sid", IntegerType, true) ::
      StructField("data", ArrayType(IntegerType, false), true) :: Nil)

val lines = spark.
      readStream.
      schema(schema).
      json("in/*.json")

val top1 = lines.groupBy("id").count()

val query = top1.writeStream
      .outputMode("complete")
      .format("console")
      .option("truncate", "false")
      .start()

To print the data you can add queryName in the write stream, by using that queryName you can print.

In your Example

val query = top1.writeStream
      .outputMode("complete")
      .queryName("xyz")
      .format("console")
      .option("truncate", "false")
      .start()

run this and you can display data by using SQL query

%sql select * from xyz 

or you can Create Dataframe

val df = spark.sql("select * from xyz")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM