数据未加载到 Hive 中的分区表中

Question

我正在尝试为我的表创建分区以更新值。

这是我的示例数据

1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A

我想将Janet's部门更新为B 。

为此，我创建了一个以部门为分区的表。

创建外部表跟踪 (EmployeeID Int,FirstName String,Designation String,Salary Int) PARTITIONED BY (Department String) 行格式分隔字段以“,”位置 '/user/sreeveni/HIVE' 结尾；

但是在执行上述命令时。 没有数据插入到跟踪表中。

hive>select * from trail;                               
OK
Time taken: 0.193 seconds

hive>desc trail;                                        
OK
employeeid              int                     None                
firstname               string                  None                
designation             string                  None                
salary                  int                     None                
department              string                  None                

# Partition Information      
# col_name              data_type               comment             

department              string                  None

我做错了什么吗？

更新

按照建议，我尝试将数据插入到我的表中

加载数据 inpath '/user/aibladmin/HIVE' 覆盖到 table trail Partition(Department);

但它显示

失败：SemanticException [错误 10096]：动态分区严格模式需要至少一个静态分区列。 要关闭此设置 hive.exec.dynamic.partition.mode=nonstrict

设置后set hive.exec.dynamic.partition.mode=nonstrict也没有正常工作。

还有什么事情要做。

Answer 1

尝试以下两个属性

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

在为分区表编写插入语句时，请确保在 select 子句的最后指定分区列。

Answer 2

请尝试以下操作：

首先创建表：

create external table test23 (EmployeeID Int,FirstName String,Designation String,Salary Int) PARTITIONED BY (Department String) row format delimited fields terminated by "," location '/user/rocky/HIVE';

在 hdfs 中创建一个具有分区名称的目录：

$ hadoop fs -mkdir /user/rocky/HIVE/department=50000

通过过滤部门等于 50000 的记录创建本地文件abc.txt ：

$ cat abc.txt 
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B

放入HDFS：

$ hadoop fs -put /home/yarn/abc.txt /user/rocky/HIVE/department=50000

现在改变表：

ALTER TABLE test23 ADD PARTITION(department=50000);

并检查结果：

select * from test23 ;

Answer 3

您不能直接将数据（Hdfs 文件）插入到分区配置单元表中。 首先需要创建一个普通表，然后将该表数据插入到分区表中。

set hive.exec.dynamic.partition.mode=strict意味着当你填充 hive 表时，它必须至少有一个静态分区列。

set hive.exec.dynamic.partition.mode=nonstrict在这种模式下你不需要任何静态分区列。

Answer 4

我遇到了同样的问题，是的，这两个属性是必需的。 但是，在执行 Hive 语句之前，我使用 JDBC 驱动程序和 Scala 来设置这些属性。 然而，问题是我在这样的一个执行语句中执行了一堆属性（SET 语句）

     conn = DriverManager.getConnection(conf.get[String]("hive.jdbc.url"))
     conn.createStatement().execute(
"SET spark.executor.memory = 2G;
SET hive.exec.dynamic.partition.mode = nonstrict; 
SET hive.other.statements =blabla  ;")

出于某种原因，驱动程序无法将所有这些解释为单独的语句，因此我需要单独执行它们中的每一个。

  conn = DriverManager.getConnection(conf.get[String]("hive.jdbc.url"))
    conn.createStatement().execute("SET spark.executor.memory = 2G;")
    conn.createStatement().execute("SET hive.exec.dynamic.partition.mode=nonstrict;") 
   conn.createStatement().execute("SET hive.other.statements =blabla  ;")

Answer 5

您可以尝试运行 MSCK REPAIR TABLE table_name 吗？

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

Answer 6

只需在getOrCreate()火花会话之前设置这 2 个属性：

SparkSession
    .builder
    .config(new SparkConf())
    .appName(appName)
    .enableHiveSupport()
    .config("hive.exec.dynamic.partition","true")
    .config("hive.exec.dynamic.partition.mode", "nonstrict")
    .getOrCreate()

数据未加载到 Hive 中的分区表中

问题描述

6 个解决方案

解决方案1
20 2015-04-15 09:18:39

解决方案2
2 2014-09-18 10:57:27

解决方案3
2 2017-03-09 07:17:34

解决方案4
0 2019-11-15 13:38:34

解决方案5
0 2020-01-03 08:33:24

解决方案6
0 2021-11-03 15:55:59

数据未加载到 Hive 中的分区表中

问题描述

6 个解决方案

解决方案1 20 2015-04-15 09:18:39

解决方案2 2 2014-09-18 10:57:27

解决方案3 2 2017-03-09 07:17:34

解决方案4 0 2019-11-15 13:38:34

解决方案5 0 2020-01-03 08:33:24

解决方案6 0 2021-11-03 15:55:59

解决方案1
20 2015-04-15 09:18:39

解决方案2
2 2014-09-18 10:57:27

解决方案3
2 2017-03-09 07:17:34

解决方案4
0 2019-11-15 13:38:34

解决方案5
0 2020-01-03 08:33:24

解决方案6
0 2021-11-03 15:55:59