[英]Structured streaming debugging input
Is there a way for me to print out the incoming data? 我有办法打印出传入的数据吗? For eg I have a readStream on a folder looking for JSON files, however there seems to be an issue as I am seeing 'nulls' in the aggregation output.
例如,我在一个文件夹中有一个readStream在寻找JSON文件,但是由于在聚合输出中看到“空”,因此似乎存在问题。
val schema = StructType(
StructField("id", LongType, false) ::
StructField("sid", IntegerType, true) ::
StructField("data", ArrayType(IntegerType, false), true) :: Nil)
val lines = spark.
readStream.
schema(schema).
json("in/*.json")
val top1 = lines.groupBy("id").count()
val query = top1.writeStream
.outputMode("complete")
.format("console")
.option("truncate", "false")
.start()
To print the data you can add queryName in the write stream, by using that queryName you can print. 要打印数据,可以在写入流中添加queryName,方法是使用该queryName进行打印。
In your Example 在你的例子中
val query = top1.writeStream
.outputMode("complete")
.queryName("xyz")
.format("console")
.option("truncate", "false")
.start()
run this and you can display data by using SQL query 运行此命令,即可使用SQL查询显示数据
%sql select * from xyz
or you can Create Dataframe 或者您可以创建数据框
val df = spark.sql("select * from xyz")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.