简体繁体中英

Performance benchmarking between Hive (on Tez) and Spark for my particular use case

原文 2016-12-30 17:47:36 3 1 hadoop/ apache-spark/ hive/ performance-testing

I'm playing around with some data on cluster and want to do some aggregations --- nothing too complicated, but more complicated than sum, there are few joins and count distincts. I have implemented this aggregation in Hive and Spark with Scala and want to compare the execution times.

When I submit the scripts from gateway, linux time functions gives me real time smaller than sys time, which I expected. But I'm not sure which one to pick as proper comparision. Maybe just use sys.time and run the both queries for several times? Is it acceptable or I'm complete noob in this case?

1 answers

Real time. From a performance benchmark perspective, you only care about how long (human time) it takes before your query is completed and you can look at the results, not how many processes are getting spun up by the application internally.

Note, I would be very careful with performance benchmarking, as both Spark and Hive have plenty of tunable configuration knobs that greatly affect performance. See here for a few examples to alter Hive performance with vectorization, data format choices, data bucketing and data sorting.

The "general consensus" is that Spark is faster than Hive on Tez, but that Hive can handle huge data sets that don't fit in memory better. (I'm not going to cite a source since I'm lazy, do some googling)

Is really Hive on Tez with ORC performance better than Spark SQL for ETL?

hive llap - which execution engine supported? spark,mr, tez

What is the use of Hive's LLAP when there is Hive TEZ?

Difference between hive.tez.container.size and tez.task.resource.memory.mb

When to use Hive engine MR and when to use TEZ?

Hive query getting failed while trying to use TEZ engine

OOM in tez/hive

Spark Performance Issue vs Hive

Hive Job failed with return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask and Query Performance

hive on tez is not supporting hdfs federation

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Is really Hive on Tez with ORC performance better than Spark SQL for ETL? hive llap - which execution engine supported? spark,mr, tez What is the use of Hive's LLAP when there is Hive TEZ? Difference between hive.tez.container.size and tez.task.resource.memory.mb When to use Hive engine MR and when to use TEZ? Hive query getting failed while trying to use TEZ engine OOM in tez/hive Spark Performance Issue vs Hive Hive Job failed with return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask and Query Performance hive on tez is not supporting hdfs federation

Related Tags

Performance benchmarking between Hive (on Tez) and Spark for my particular use case

Question

1 answers

solution1 0 ACCPTED 2016-12-30 21:14:11

solution1
0 ACCPTED 2016-12-30 21:14:11