简体   繁体   English

结构化流调试输入

[英]Structured streaming debugging input

Is there a way for me to print out the incoming data? 我有办法打印出传入的数据吗? For eg I have a readStream on a folder looking for JSON files, however there seems to be an issue as I am seeing 'nulls' in the aggregation output. 例如,我在一个文件夹中有一个readStream在寻找JSON文件,但是由于在聚合输出中看到“空”,因此似乎存在问题。

val schema = StructType(
      StructField("id", LongType, false) ::
      StructField("sid", IntegerType, true) ::
      StructField("data", ArrayType(IntegerType, false), true) :: Nil)

val lines = spark.
      readStream.
      schema(schema).
      json("in/*.json")

val top1 = lines.groupBy("id").count()

val query = top1.writeStream
      .outputMode("complete")
      .format("console")
      .option("truncate", "false")
      .start()

To print the data you can add queryName in the write stream, by using that queryName you can print. 要打印数据,可以在写入流中添加queryName,方法是使用该queryName进行打印。

In your Example 在你的例子中

val query = top1.writeStream
      .outputMode("complete")
      .queryName("xyz")
      .format("console")
      .option("truncate", "false")
      .start()

run this and you can display data by using SQL query 运行此命令,即可使用SQL查询显示数据

%sql select * from xyz 

or you can Create Dataframe 或者您可以创建数据框

val df = spark.sql("select * from xyz")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark结构化流处理多行输入 - Spark Structured Streaming process multi line input 来自 Kafka 的 pySpark Structured Streaming 不会输出到控制台进行调试 - pySpark Structured Streaming from Kafka does not output to console for debugging Spark Structured Streaming - 输入速率的峰值减少了批处理持续时间 - Spark Structured Streaming - Spike in input rate decreases batch duration Docker 容器中的 Spark 不读取 Kafka 输入 - 结构化流 - Spark in Docker container does not read Kafka input - Structured Streaming 为什么 Spark Structured Streaming 不允许更改输入源的数量? - Why does Spark Structured Streaming not allow changing the number of input sources? Spark Structured Streaming:处理负载是否会影响输入速率/numInputRecords? - Spark Structured Streaming: Does Processing load affect Input Rate/numInputRecords? Spark Structured Streaming - 由于增加输入源的数量,检查点中的 AssertionError - Spark Structured Streaming - AssertionError in Checkpoint due to increasing the number of input sources Spark 流式传输与结构化流式传输 - Spark Streaming vs Structured Streaming 流式传输选项卡未显示结构化流式传输 - Streaming tab is not showing for structured streaming Pyspark 结构化流处理 - Pyspark Structured streaming processing
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM