简体   繁体   中英

Why the total uptime in Spark UI is not equal to the sum of all job duration

I run a Spark Job and try to tune it faster. It is weird that the total uptime is 1.1 hours, but I add up all the job duration. It only takes 25 mins. I'm curious about Why the total uptime in Spark UI is not equal to the sum of all job duration?

This is the Spark UI information. Total uptime is 1.1 hour.

Total Up Time

But the sum of all the jobs duration is around 25 mins All job's duration

thanks you very much

Total uptime is time since Spark application or driver started. Jobs durations is the time spent in processing the tasks on RDDs/DataFrames .

All the statements which are executed by the driver program contribute to the total uptime but not necessarily to the job duration. For eg:

val rdd: RDD[String] = ???
(0 to 100).foreach(println)  // contribute in total uptime not in job duration
Thread.sleep(10000)          // contribute in total uptime not in job duration
rdd.count                    // contribute in total uptime as well as in job duration

Another example is how the spark-redshift connector works. Every query(DAG) execution when reading or writing from redshift issues a COPY / UNLOAD command to write the data to/from s3.

During this operation executors are not doing any work and the driver program is blocked until the data transfer to s3 is completed. This time will add in the total uptime but won't show in Job duration . Further actions on the DataFrame (which now internally reads files from s3) will add to the Job duration

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM