简体   繁体   中英

How to see a particular metric in Spark Structured Streaming with Python

I'm very new to Spark and Python. I'm trying to see any metric in Spark Structured Streaming (for example, "processedRowsPerSecond"), but I don't know how to do it. I've read in "Structured Streaming Programming Guide" that with "print(query.lastProgress)" you can directly get the current status and metrics of an active query, but if I write it I only obtain "None" once. The last part of my code is the following:

query = windowedCountsDF\
    .writeStream\
    .outputMode('update')\
    .option("truncate", "false") \
    .format('console') \
    .queryName("numbers") \
    .start()

print(query.lastProgress)

query.awaitTermination()

Any idea on how to do it will be highly appreciated.

it really depends on what do you want to do with that metric. Your problem is that you're calling query.awaitTermination() , and it blocks any other activity. If you want to collect metrics, then instead of calling query.awaitTermination() , you need to implement your own wait loop, something like this:

query = ...

while not query.exception():
  if query.lastProgress:
    print(query.lastProgress) # do something with your data
  time.sleep(10) # wait 10 seconds..

try with:

while query.isActive:
    print("\n")
    print(query.status)
    print(query.lastProgress)
    time.sleep(30)

query.awaitTermination() blocks query.lastProgress.

I hope it works fine for you query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM