简体繁体中英

Spark-SQL plug in on HIVE

原文 2021-07-30 18:02:20 5 1 apache-spark/ hive/ apache-spark-sql

HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift framework is actually customised as HIVESERVER2. In this way, HIVE is acting as a service. Via programming language, we can use HIVE as a database.

The relationship between Spark-SQL and HIVE is that:

Spark-SQL just utilises the HIVE setup (HDFS file system, HIVE Metastore, Hiveserver2). When we invoke /sbin/start-thriftserver2.sh (present in spark installation), we are supposed to give hiveserver2 port number, and the hostname. Then via spark's beeline, we can actually create, drop and manipulate tables in HIVE. The API can be either Spark-SQL or HIVE QL. If we create a table / drop a table, it will be clearly visible if we login into HIVE and check(say via HIVE beeline or HIVE CLI). To put in other words, changes made via Spark can be seen in HIVE tables.

My understanding is that Spark does not have its own meta store setup like HIVE. Spark just utilises the HIVE setup and simply the SQL execution happens via Spark SQL API.

Is my understanding correct here?

Then I am little confused about the usage of bin/spark-sql.sh (which is also present in Spark installation). Documentation says that via this SQL shell, we can create tables like we do above (via Thrift Server/Beeline). Now my question is: How the metadata information is maintained by spark then?

Or like the first approach, can we make spark-sql CLI to communicate to HIVE (to be specific: hiveserver2 of HIVE)? If yes, how can we do that?

Thanks in advance!

1 answers

My understanding is that Spark does not have its own meta store setup like HIVE

Spark will start a Derby server on its own, if a Hive metastore is not provided

can we make spark-sql CLI to communicate to HIVE

Start an external metastore process, add a hive-site.xml file to $SPARK_CONF_DIR with hive.metastore.uris , or use SET SQL statements for the same.

Then spark-sql CLI should be able to query Hive tables. From code, you need to use enableHiveSupport() method on the SparkSession.

Spark-sql read hive table failed

Spark-sql can not find the data in Hive?

Connecting to Hive using Spark-SQL

How to delete a hive database with spark-sql?

Why use Hive on Spark instead of Spark-SQL?

Spark - Hive UDF is working with Spark-SQL but not with DataFrame

What is the preferred way to avoid SQL injections in Spark-SQL (on Hive)

Why spark-sql cpu utilization is higher than hive?

How to read Hive Table with Spark-Sql efficiently

Spark-Sql returns 0 records without repairing hive table

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark-sql read hive table failed Spark-sql can not find the data in Hive? Connecting to Hive using Spark-SQL How to delete a hive database with spark-sql? Why use Hive on Spark instead of Spark-SQL? Spark - Hive UDF is working with Spark-SQL but not with DataFrame What is the preferred way to avoid SQL injections in Spark-SQL (on Hive) Why spark-sql cpu utilization is higher than hive? How to read Hive Table with Spark-Sql efficiently Spark-Sql returns 0 records without repairing hive table

Related Tags

Spark-SQL plug in on HIVE

Question

1 answers

solution1 1 ACCPTED 2022-03-11 13:53:58

solution1
1 ACCPTED 2022-03-11 13:53:58