[英]Table not found error while loading DataFrame into a Hive partition
I am trying to insert data into Hive table like this: 我试图像这样将数据插入到Hive表中:
val partfile = sc.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
val partSchema = StructType(Array(StructField("id",IntegerType,true),StructField("name",StringType,true),StructField("salary",IntegerType,true),StructField("dept",StringType,true),StructField("location",StringType,true)))
val partRDD = partdata.map(p => Row(p(0).toInt,p(1),p(2).toInt,p(3),p(4)))
val partDF = sqlContext.createDataFrame(partRDD, partSchema)
Packages I imported: 我导入的软件包:
import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType}
import org.apache.spark.sql.types._
This is how I tried to insert the dataframe into Hive partition: 这是我尝试将数据框插入Hive分区的方式:
partDF.write.mode(saveMode.Append).partitionBy("location").insertInto("parttab")
Im getting the below error even though I have the Hive Table: 即使我有配置单元表,我也收到以下错误:
org.apache.spark.sql.AnalysisException: Table not found: parttab;
Could anyone tell me what is the mistake I am doing here and how can I correct it ? 谁能告诉我我在这里做的错误是什么,我该如何纠正?
To write data to Hive warehouse, you need to initialize hiveContext
instance. 要将数据写入Hive仓库,您需要初始化hiveContext
实例。
Upon doing that, it will take confs from Hive-Site.xml
(from classpath); 完成后,它将从Hive-Site.xml
(来自classpath)获取conf; and connects to underlying Hive warehouse. 并连接到基础Hive仓库。
HiveContext
is an extension to SQLContext
to support and connect to hive. HiveContext
是对SQLContext
的扩展,以支持并连接到SQLContext
。
To do so, try this:: 为此,请尝试以下操作:
val hc = new HiveContext(sc)
And perform your append-query
onn this instance. 并在此实例上执行append-query
。
partDF.registerAsTempTable("temp")
hc.sql(".... <normal sql query to pick data from table `temp`; and insert in to Hive table > ....")
Please make sure that the table parttab
is under db - default
. 请确保表parttab
在db- default
。
If the table in under another db, table name should be specified as : <db-name>.parttab
如果表位于另一个db下,则表名应指定为: <db-name>.parttab
If you need to directly save the dataframe
in to hive; 如果您需要直接将dataframe
保存到配置单元中,请执行以下操作: use this: 用这个:
df.saveAsTable("<db-name>.parttab")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.