使用Spark SQL创建带有联接的Hive表时使用的引擎

Question

I am not sure from documentation if when creating an Hive table using HiveContext from Spark, will it use the Spark engine or the standard Hive mapreduce job to perform the task? 我从文档中不确定是否在使用Spark中的HiveContext创建Hive表时会使用Spark引擎还是标准的Hive mapreduce作业来执行任务？

val sc = new SparkContext()
val hc = new HiveContext(sc)

hc.sql("""
    CREATE TABLE db.new_table
    STORED AS PARQUET
    AS SELECT
        field1,
        field2,
        field3
    FROM db.src1 
        JOIN db.src2
        ON (x = y)
"""
)

Answer 1

Spark 1.6

Spark SQL supports Apache Hive using HiveContext . Spark SQL使用HiveContext支持Apache Hive。 It uses the Spark SQL execution engine to work with data stored in Hive. 它使用Spark SQL execution engine来处理存储在Hive中的数据。

above Spark 2.x

val spark = SparkSession .builder() .appName( "SparkSessionExample" ) .config( "spark.sql.warehouse.dir" , warehouseLocation) . val spark = SparkSession .builder（）.appName（“ SparkSessionExample”）.config（“ spark.sql.warehouse.dir”，WarehouseLocation）。 enableHiveSupport() .getOrCreate() enableHiveSupport() getOrCreate（）

Answer 2

When doing this now, SPARK will use SPARK APIs and not MR. 现在，SPARK将使用SPARK API，而不是MR。 Hivecontext need not be explicitly referenced as is deprecated, even in spark-submit / program mode. 即使在spark-submit / program模式下，也不需要像以前一样明确引用Hivecontext。

使用Spark SQL创建带有联接的Hive表时使用的引擎

问题描述

2 个解决方案

解决方案1
1 2018-07-13 10:11:48

解决方案2
0 2018-07-14 07:47:41

使用Spark SQL创建带有联接的Hive表时使用的引擎

问题描述

2 个解决方案

解决方案1 1 2018-07-13 10:11:48

解决方案2 0 2018-07-14 07:47:41

解决方案1
1 2018-07-13 10:11:48

解决方案2
0 2018-07-14 07:47:41