简体繁体中英

Why spark-sql cpu utilization is higher than hive?

原文 2018-05-26 12:24:24 1 1 apache-spark/ hive/ cpu-usage

I am performing same query in both Hive and Spark SQL. We know that Spark is faster than hive, so i got the expected response time.

But when we consider about the CPU Utilization,

Spark process takes above >300%
while Hive takes near 150% for the same.

Is it the real nature of Spark and Hive?

What other metrics needs to be considered?
How to evaluate both in right way?

1 answers

A big picture

Spark has no superpowers. The source of it advantage over MapReduce, is preference towards fast in-memory access, over slower out-of-core processing depending on distributed storage. So what it does it at its core is cutting off IO wait time.

Conclusion

Higher average CPU utilization is expected. Let's say you want to compute sum of N number. Independent of implementation asymptotic number of operations will be the same. However, if data is in-memory, you can expect lower total time and higher average CPU usage, while if data is on disk, you can expect higher total time and lower average CPU usage (higher IO wait).

Some remarks :

Spark and Hive are not designed with the same goals in mind. Spark is more ETL / streaming ETL tool, Hive database / data warehouse. This implies different optimization under the hood and performance can differ highly, depending on the workload.
Comparing resource usage without the context doesn't make much sense.
In general Spark is less conservative and more resource hungry. It reflects both the design goals, as well as hardware evolution. Spark is a few years younger, and it is enough to see significant drop in the hardware cost.

Why use Hive on Spark instead of Spark-SQL?

Spark-SQL plug in on HIVE

Spark-sql read hive table failed

Spark-sql can not find the data in Hive?

Connecting to Hive using Spark-SQL

How to delete a hive database with spark-sql?

Spark - Hive UDF is working with Spark-SQL but not with DataFrame

What is the preferred way to avoid SQL injections in Spark-SQL (on Hive)

Why Uncache table in spark-sql not working?

How to read Hive Table with Spark-Sql efficiently

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why use Hive on Spark instead of Spark-SQL? Spark-SQL plug in on HIVE Spark-sql read hive table failed Spark-sql can not find the data in Hive? Connecting to Hive using Spark-SQL How to delete a hive database with spark-sql? Spark - Hive UDF is working with Spark-SQL but not with DataFrame What is the preferred way to avoid SQL injections in Spark-SQL (on Hive) Why Uncache table in spark-sql not working? How to read Hive Table with Spark-Sql efficiently

Related Tags

Why spark-sql cpu utilization is higher than hive?

Question

1 answers

solution1 3 ACCPTED 2018-05-26 15:26:41

solution1
3 ACCPTED 2018-05-26 15:26:41