How to display results of intermediate transformations of streaming query?

Question

I am implementing one usecase to try-out Spark Structured Streaming API. The source data is read from Kafka topic and after applying some transformations, results written to console.

I want to print the intermediate output along with the final results of the structured streaming query.

Here is the code snippet:

    val trips = getTaxiTripDataframe() //this function consumes kafka topic and desrialize the byte array to create dataframe with required columns

    val filteredTrips = trips.filter(col("taxiCompany").isNotNull && col("pickUpArea").isNotNull)

    val output = filteredTrips
      .groupBy("taxiCompany","pickupArea")
      .agg(Map("pickupArea" -> "count"))

    val query = output.writeStream.format("console")
      .option("numRows","50")
      .option("truncate","false")
      .outputMode("update").start()

    query.awaitTermination()

I want to print 'filteredTrips' dataframe on console. I tried using .show() method of dataframe, but as it is a dataframe created on streaming data, it is throwing below exception:

org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;

Is there any other work around?

Answer 1

Yes, you can create two streams (I am using Spark 2.4.3)

val filteredTrips = trips.filter(col("taxiCompany").isNotNull && col("pickUpArea").isNotNull)
val query1 = filteredTrips
      .format("console")
      .option("numRows","50")
      .option("truncate","false")
      .outputMode("update").start()

val query2 = filteredTrips
      .groupBy("taxiCompany","pickupArea")
      .agg(Map("pickupArea" -> "count"))
      .writeStream
      .format("console")
      .option("numRows","50")
      .option("truncate","false")
      .outputMode("update").start()

query1.awaitTermination()
query2.awaitTermination()

How to display results of intermediate transformations of streaming query?

Question

1 answers

solution1
1 ACCPTED 2019-07-31 10:44:19

How to display results of intermediate transformations of streaming query?

Question

1 answers

solution1 1 ACCPTED 2019-07-31 10:44:19

solution1
1 ACCPTED 2019-07-31 10:44:19