[英]engine used when creating Hive table with joins using Spark SQL
I am not sure from documentation if when creating an Hive table using HiveContext from Spark, will it use the Spark engine or the standard Hive mapreduce job to perform the task? 我从文档中不确定是否在使用Spark中的HiveContext创建Hive表时会使用Spark引擎还是标准的Hive mapreduce作业来执行任务?
val sc = new SparkContext()
val hc = new HiveContext(sc)
hc.sql("""
CREATE TABLE db.new_table
STORED AS PARQUET
AS SELECT
field1,
field2,
field3
FROM db.src1
JOIN db.src2
ON (x = y)
"""
)
Spark 1.6
Spark SQL supports Apache Hive using HiveContext
. Spark SQL使用
HiveContext
支持Apache Hive。 It uses the Spark SQL execution engine
to work with data stored in Hive. 它使用
Spark SQL execution engine
来处理存储在Hive中的数据。
above Spark 2.x
val spark = SparkSession .builder() .appName( "SparkSessionExample" ) .config( "spark.sql.warehouse.dir" , warehouseLocation) .
val spark = SparkSession .builder().appName(“ SparkSessionExample”).config(“ spark.sql.warehouse.dir”,WarehouseLocation)。
enableHiveSupport()
.getOrCreate()enableHiveSupport()
getOrCreate()
When doing this now, SPARK will use SPARK APIs and not MR. 现在,SPARK将使用SPARK API,而不是MR。 Hivecontext need not be explicitly referenced as is deprecated, even in spark-submit / program mode.
即使在spark-submit / program模式下,也不需要像以前一样明确引用Hivecontext。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.