简体   繁体   中英

Spark-SQL plug in on HIVE

HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift framework is actually customised as HIVESERVER2. In this way, HIVE is acting as a service. Via programming language, we can use HIVE as a database.

The relationship between Spark-SQL and HIVE is that:

Spark-SQL just utilises the HIVE setup (HDFS file system, HIVE Metastore, Hiveserver2). When we invoke /sbin/start-thriftserver2.sh (present in spark installation), we are supposed to give hiveserver2 port number, and the hostname. Then via spark's beeline, we can actually create, drop and manipulate tables in HIVE. The API can be either Spark-SQL or HIVE QL. If we create a table / drop a table, it will be clearly visible if we login into HIVE and check(say via HIVE beeline or HIVE CLI). To put in other words, changes made via Spark can be seen in HIVE tables.

My understanding is that Spark does not have its own meta store setup like HIVE. Spark just utilises the HIVE setup and simply the SQL execution happens via Spark SQL API.

Is my understanding correct here?

Then I am little confused about the usage of bin/spark-sql.sh (which is also present in Spark installation). Documentation says that via this SQL shell, we can create tables like we do above (via Thrift Server/Beeline). Now my question is: How the metadata information is maintained by spark then?

Or like the first approach, can we make spark-sql CLI to communicate to HIVE (to be specific: hiveserver2 of HIVE)? If yes, how can we do that?

Thanks in advance!

My understanding is that Spark does not have its own meta store setup like HIVE

Spark will start a Derby server on its own, if a Hive metastore is not provided

can we make spark-sql CLI to communicate to HIVE

Start an external metastore process, add a hive-site.xml file to $SPARK_CONF_DIR with hive.metastore.uris , or use SET SQL statements for the same.

Then spark-sql CLI should be able to query Hive tables. From code, you need to use enableHiveSupport() method on the SparkSession.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM