简体   繁体   中英

Can someone explain this : “Spark SQL supports a different use case than Hive.”

I am referring to the following link : Hive Support for Spark

It says :

"Spark SQL supports a different use case than Hive."

I am not sure why that will be the case. Does this mean as a Hive user i cannot use Spark execution engine through Spark SQL?

Some Questions:

  • Spark SQL uses Hive Query parser. So it will ideally support all of Hive functionality.
  • Will it use Hive Metastore?
  • Will Hive use the Spark optimizer or will it build its own optimizer?
  • Will Hive translate MR Jobs into Spark? Or use some other paradigm?

Spark SQL is intended to allow the use of SQL expressions on top of Spark's machine learning libraries. It allows you to use SQL as a tool (among others) for building advanced analytic (eg ML) applications. It is not a drop-in replacement for Hive, which is really best at batch processing/ETL.

However, there is also work ongoing upstream to allow Spark to serve as a general data processing backend for Hive. That work is what would allow you to take full advantage of Spark for Hive use cases specifically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM