I run a Spark Job and try to tune it faster. It is weird that the total uptime is 1.1 hours, but I add up all the job duration. It only takes 25 mins. I'm curious about Why the total uptime in Spark UI is not equal to the sum of all job duration?
This is the Spark UI information. Total uptime is 1.1 hour.
But the sum of all the jobs duration is around 25 mins All job's duration
thanks you very much
Total uptime
is time since Spark application or driver started. Jobs durations
is the time spent in processing the tasks on RDDs/DataFrames
.
All the statements which are executed by the driver program contribute to the total uptime but not necessarily to the job duration. For eg:
val rdd: RDD[String] = ???
(0 to 100).foreach(println) // contribute in total uptime not in job duration
Thread.sleep(10000) // contribute in total uptime not in job duration
rdd.count // contribute in total uptime as well as in job duration
Another example is how the spark-redshift connector works. Every query(DAG) execution when reading or writing from redshift issues a COPY
/ UNLOAD
command to write the data to/from s3.
During this operation executors are not doing any work and the driver program is blocked until the data transfer to s3 is completed. This time will add in the total uptime but won't show in Job duration
. Further actions on the DataFrame
(which now internally reads files from s3) will add to the Job duration
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.