简体   繁体   English

“ Hive on Spark模式”和“ Spark SQL”有什么区别? “ Hive on Spark模式”会使用Catalyst Optimizer吗?

[英]What is the difference between 'Hive on Spark mode' and 'Spark SQL'? Will 'Hive on Spark mode' uses Catalyst Optimizer?

  • Hive on Spark mode vs MR mode Hive on Spark模式与MR模式
  • Spark SQL Spark SQL
  • Catalyst Optimizer/RDD/Tungsten 催化剂优化器/ RDD /钨

Hive on Spark is different from running Hive queries using Spark SQL via HiveContext. Hive on Spark与通过HiveContext使用Spark SQL运行Hive查询不同。 It doesn't translate the queries to Spark primitives but translates them to MapReduce primitives and executes on Spark.Its main purpose to utilize Spark exeuction engine, without impacting existing code in Hive. 它不会将查询转换为Spark原语,而是将其转换为MapReduce原语并在Spark上执行,其主要目的是利用Spark执行引擎,而不影响Hive中的现有代码。

Internally, it translates Hive's logical operators to Spark Tasks , which are mostly RDD transformations and actions and doesn't use Dataframe as yet ( as per official documentation) , so it doesn't utilize Tungsten or Catalyst Optimizer for those. 在内部,它将Hive的逻辑运算符转换为Spark任务,这些任务主要是RDD转换和操作,并且尚未使用Dataframe(根据官方文档),因此它不使用Tungsten或Catalyst Optimizer。

This document below mentions all design consideration for Hive on Spark as available in official documentation - 以下文档提到了官方文档中有关Hive on Spark的所有设计注意事项-

Hive on Spark Mode Design Hive on Spark模式设计

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM