简体繁体中英

How to get the time cost of reading data from hdfs in Spark

原文 2016-06-27 08:23:28 2 1 performance/ apache-spark

Spark's Timeline contains:

Scheduler Delay
Task Deserialization Time
Shuffle Read Time
Executor Computing Time
Shuffle Write Time
Result Serialization Time
Getting Result Time

It seems that the time cost of reading data from sources, such as hdfs, is included in Executor Computing Time . But I am not sure.

If it is in Executor Computing Time , how can I get it without including the time cost of computation.

Thanks.

1 answers

It's hard to properly distinguish how long a read operation takes as processing is done on the data as it's being read.

A simple best-bet is just to apply a trivial operation (say, count) that will have very little overhead. If your file is sizable, read will vastly dominate the trivial operation, especially if it's one like count that can be done without shuffling data between nodes (aside from the single-value result).

Cost of reading fields of classes

Spark local vs hdfs permormance

How to get time to retrieve data in Monogdb

Where can I find the cost of the operations in Spark?

Spark read partitions - Resource cost analysis

Sorting when comparisons cost no time

Cost of file modification time checks

Reading data from a CSV file

Reading from DATA file handle

Faster reading of time series from netCDF?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cost of reading fields of classes Spark local vs hdfs permormance How to get time to retrieve data in Monogdb Where can I find the cost of the operations in Spark? Spark read partitions - Resource cost analysis Sorting when comparisons cost no time Cost of file modification time checks Reading data from a CSV file Reading from DATA file handle Faster reading of time series from netCDF?

Related Tags

How to get the time cost of reading data from hdfs in Spark

Question

1 answers

solution1 0 2017-07-25 14:27:51

solution1
0 2017-07-25 14:27:51