将DataFrame加载到Hive分区时找不到表错误

Question

I am trying to insert data into Hive table like this: 我试图像这样将数据插入到Hive表中：

val partfile = sc.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
val partSchema = StructType(Array(StructField("id",IntegerType,true),StructField("name",StringType,true),StructField("salary",IntegerType,true),StructField("dept",StringType,true),StructField("location",StringType,true)))
val partRDD = partdata.map(p => Row(p(0).toInt,p(1),p(2).toInt,p(3),p(4)))
val partDF = sqlContext.createDataFrame(partRDD, partSchema)

Packages I imported: 我导入的软件包：

import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType}
import org.apache.spark.sql.types._

This is how I tried to insert the dataframe into Hive partition: 这是我尝试将数据框插入Hive分区的方式：

partDF.write.mode(saveMode.Append).partitionBy("location").insertInto("parttab")

Im getting the below error even though I have the Hive Table: 即使我有配置单元表，我也收到以下错误：

org.apache.spark.sql.AnalysisException: Table not found: parttab;

Could anyone tell me what is the mistake I am doing here and how can I correct it ? 谁能告诉我我在这里做的错误是什么，我该如何纠正？

Answer 1

To write data to Hive warehouse, you need to initialize hiveContext instance. 要将数据写入Hive仓库，您需要初始化hiveContext实例。

Upon doing that, it will take confs from Hive-Site.xml (from classpath); 完成后，它将从Hive-Site.xml （来自classpath）获取conf； and connects to underlying Hive warehouse. 并连接到基础Hive仓库。

HiveContext is an extension to SQLContext to support and connect to hive. HiveContext是对SQLContext的扩展，以支持并连接到SQLContext 。

To do so, try this:: 为此，请尝试以下操作：

val hc = new HiveContext(sc)

And perform your append-query onn this instance. 并在此实例上执行append-query 。

partDF.registerAsTempTable("temp")

hc.sql(".... <normal sql query to pick data from table `temp`; and insert in to Hive table > ....")

Please make sure that the table parttab is under db - default . 请确保表parttab在db- default 。

If the table in under another db, table name should be specified as : <db-name>.parttab 如果表位于另一个db下，则表名应指定为： <db-name>.parttab

If you need to directly save the dataframe in to hive; 如果您需要直接将dataframe保存到配置单元中，请执行以下操作： use this: 用这个：

df.saveAsTable("<db-name>.parttab")

将DataFrame加载到Hive分区时找不到表错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-06-23 05:30:04

将DataFrame加载到Hive分区时找不到表错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-06-23 05:30:04

解决方案1
1 已采纳 2017-06-23 05:30:04