简体繁体中英

Why use Hive on Spark instead of Spark-SQL?

原文 2015-05-12 16:13:04 4 2 apache-spark/ hive/ bigdata/ apache-spark-sql

I'm new to the Data Science field and I don't understand why would someone want to connect Hive to Spark instead of just using Sqark-SQL.

What benefits are there for using Hive on Spark rather than Spark-SQL (other than being able to use Hive code already in production)?

Thanks

2 answers

That answer above is not correct. The one component that is common between Hive and SparkSQL is SemanticAnalyzer . Hive has significantly better SQL support and a more sophisticated cost based optimizer. My recommendation is to use Hive on Tez opposed to Hive on Spark or SparkSQL as it is production ready, more stable and scalable.

hmm, it seems the only answer here gives an advice to use tez...

back to the original question, benefits for using Hive on Spark, IMHO, the benefits are mainly a better hive feature support, not the HiveQL language support, Hive on Spark has a much better support for hiveserver2 and security features.

in SparkSQL they are really buggy, there is a hiveserver2 impl in SparkSQL, but in latest release version (1.6.x), hiveserver2 in SparkSQL doesn't work with hivevar and hiveconf argument anymore, and the username for login via jdbc doesn't work either... see https://issues.apache.org/jira/browse/SPARK-13983

our requirement is using spark with hiveserver2 in a secure way (with authentication and authorization), currently SparkSQL alone can not provide this, and we do not need to use other hadoop components like HDFS or YARN, we are using spark standalone, so for our requirement, we are using ranger/sentry + Hive on Spark.

Spark-SQL plug in on HIVE

Why spark-sql cpu utilization is higher than hive?

Spark-sql read hive table failed

Spark-sql can not find the data in Hive?

Connecting to Hive using Spark-SQL

How to delete a hive database with spark-sql?

Spark - Hive UDF is working with Spark-SQL but not with DataFrame

Why Uncache table in spark-sql not working?

What is the preferred way to avoid SQL injections in Spark-SQL (on Hive)

NullPointerException in spark-sql

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark-SQL plug in on HIVE Why spark-sql cpu utilization is higher than hive? Spark-sql read hive table failed Spark-sql can not find the data in Hive? Connecting to Hive using Spark-SQL How to delete a hive database with spark-sql? Spark - Hive UDF is working with Spark-SQL but not with DataFrame Why Uncache table in spark-sql not working? What is the preferred way to avoid SQL injections in Spark-SQL (on Hive) NullPointerException in spark-sql

Related Tags

Why use Hive on Spark instead of Spark-SQL?

Question

2 answers

solution1
5 ACCPTED 2015-05-30 02:21:26

solution2
0 2016-05-27 16:05:19

Why use Hive on Spark instead of Spark-SQL?

Question

2 answers

solution1 5 ACCPTED 2015-05-30 02:21:26

solution2 0 2016-05-27 16:05:19

solution1
5 ACCPTED 2015-05-30 02:21:26

solution2
0 2016-05-27 16:05:19