简体   繁体   English

将DataFrame加载到Hive分区时找不到表错误

[英]Table not found error while loading DataFrame into a Hive partition

I am trying to insert data into Hive table like this: 我试图像这样将数据插入到Hive表中:

val partfile = sc.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
val partSchema = StructType(Array(StructField("id",IntegerType,true),StructField("name",StringType,true),StructField("salary",IntegerType,true),StructField("dept",StringType,true),StructField("location",StringType,true)))
val partRDD = partdata.map(p => Row(p(0).toInt,p(1),p(2).toInt,p(3),p(4)))
val partDF = sqlContext.createDataFrame(partRDD, partSchema)

Packages I imported: 我导入的软件包:

import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType}
import org.apache.spark.sql.types._

This is how I tried to insert the dataframe into Hive partition: 这是我尝试将数据框插入Hive分区的方式:

partDF.write.mode(saveMode.Append).partitionBy("location").insertInto("parttab")

Im getting the below error even though I have the Hive Table: 即使我有配置单元表,我也收到以下错误:

org.apache.spark.sql.AnalysisException: Table not found: parttab;

Could anyone tell me what is the mistake I am doing here and how can I correct it ? 谁能告诉我我在这里做的错误是什么,我该如何纠正?

To write data to Hive warehouse, you need to initialize hiveContext instance. 要将数据写入Hive仓库,您需要初始化hiveContext实例。

Upon doing that, it will take confs from Hive-Site.xml (from classpath); 完成后,它将从Hive-Site.xml (来自classpath)获取conf; and connects to underlying Hive warehouse. 并连接到基础Hive仓库。

HiveContext is an extension to SQLContext to support and connect to hive. HiveContext是对SQLContext的扩展,以支持并连接到SQLContext

To do so, try this:: 为此,请尝试以下操作:

val hc = new HiveContext(sc)

And perform your append-query onn this instance. 并在此实例上执行append-query

partDF.registerAsTempTable("temp")

hc.sql(".... <normal sql query to pick data from table `temp`; and insert in to Hive table > ....")

Please make sure that the table parttab is under db - default . 请确保表parttab在db- default

If the table in under another db, table name should be specified as : <db-name>.parttab 如果表位于另一个db下,则表名应指定为: <db-name>.parttab

If you need to directly save the dataframe in to hive; 如果您需要直接将dataframe保存到配置单元中,请执行以下操作: use this: 用这个:

df.saveAsTable("<db-name>.parttab")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 DataFrame 保存到 Hive 时 Spark Scala 错误 - Spark Scala Error while saving DataFrame to Hive 从数据框&#39;java.lang.IllegalArgumentException创建Hive表时出错:错误的FS:文件:/ tmp / spark预期:hdfs:// nameservice1&#39; - Error while creating a Hive table from dataframe 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 找不到HIVE表 - HIVE table not found org.apache.spark.sql.AnalysisException:将数据插入Hive表时找不到表 - org.apache.spark.sql.AnalysisException: Table not found while inserting data into Hive table 如何在将 csv 文件加载到 hive 表时跳过页脚/拖车记录 - How to skip footer/trailer record while loading csv file to hive table 从scala中的配置单元表创建数据框时,无法提供架构名称作为输入 - unable to give schema name as input while creating dataframe from hive table in scala Spark 不使用来自 Hive 分区外部表的分区信息 - Spark not using partition information from Hive partitioned external table spark-shell 按分区加载现有的 hive 表? - spark-shell load existing hive table by partition? 在火花中优化Hive表加载时间 - Optimize Hive table loading time in spark 使用Spark将非规范化的Hive表加载到Elasticsearch中 - Loading a denormalized Hive table into Elasticsearch with Spark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM