在配置单元表中交换分区时出错

Question

I am trying to merge the incremental data with an existing hive table.我正在尝试将增量数据与现有的配置单元表合并。

For testing I created a dummy table from the base table as below:为了测试，我从基表创建了一个虚拟表，如下所示：

create base.dummytable like base.fact_table

The table: base.fact_table is partition based on dbsource String When I checked the dummy table's DDL, I could see that the partition column is correctly defined.表：base.fact_table 是基于dbsource String分区当我检查虚拟表的 DDL 时，我可以看到分区列已正确定义。

PARTITIONED BY (                                                 |
|   `dbsource` string)

Then I tried to exchange one of the partition from the dummy table by dropping it first.然后我尝试通过先删除它来交换虚拟表中的一个分区。

spark.sql("alter table base.dummy drop partition(dbsource='NEO4J')")

The partition: NEO4J has dropped successfully and I ran the exchange statement as below:分区：NEO4J 已成功删除，我运行了如下交换语句：

spark.sql("ALTER TABLE base.dummy EXCHANGE PARTITION (dbsource = 'NEO4J') WITH TABLE stg.inc_labels_neo4jdata")

The exchange statement is giving an error: exchange 语句报错：

Error: Error while compiling statement: FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {dbsource=NEO4J}

The table I am trying to push the incremental data is partitioned by dbsource and I have dropped it successfully.我尝试推送增量数据的表由dbsource分区，我已成功删除它。 I am running this from spark code and the config is given below:我从火花代码运行它，配置如下：

  val conf = new SparkConf().setAppName("MERGER").set("spark.executor.heartbeatInterval", "120s")
      .set("spark.network.timeout", "12000s")
      .set("spark.sql.inMemoryColumnarStorage.compressed", "true")
      .set("spark.shuffle.compress", "true")
      .set("spark.shuffle.spill.compress", "true")
      .set("spark.sql.orc.filterPushdown", "true")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .set("spark.kryoserializer.buffer.max", "512m")
      .set("spark.serializer", classOf[org.apache.spark.serializer.KryoSerializer].getName)
      .set("spark.streaming.stopGracefullyOnShutdown", "true")
      .set("spark.dynamicAllocation.enabled", "false")
      .set("spark.shuffle.service.enabled", "true")
      .set("spark.executor.instances", "4")
      .set("spark.executor.memory", "4g")
      .set("spark.executor.cores", "5")
      .set("hive.merge.sparkfiles","true")
      .set("hive.merge.mapfiles","true")
      .set("hive.merge.mapredfiles","true")

show create table base.dummy:显示创建表 base.dummy：

CREATE TABLE `base`.`dummy`(
`dff_id` bigint, 
`dff_context_id` bigint,  
`descriptive_flexfield_name` string,  
`model_table_name` string)
 PARTITIONED BY (`dbsource` string)
  ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
 STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
 OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION  
'/apps/hive/warehouse/base.db/dummy'
 TBLPROPERTIES ( 
'orc.compress'='ZLIB')

show create table stg.inc_labels_neo4jdata:显示创建表 stg.inc_labels_neo4jdata：

CREATE TABLE `stg`.`inc_labels_neo4jdata`(
`dff_id` bigint, 
`dff_context_id` bigint,  
`descriptive_flexfield_name` string,  
`model_table_name` string)
`dbsource` string)
  ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
 STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
 OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION  
'/apps/hive/warehouse/stg.db/inc_labels_neo4jdata'
 TBLPROPERTIES ( 
'orc.compress'='ZLIB')

Could anyone let me know what the mistake I am doing here & what should I change inorder to successfully exchange the partition ?任何人都可以让我知道我在这里做的错误是什么以及我应该更改什么才能成功交换分区？

Answer 1

My take on this error is that table stg.inc_labels_neo4jdata is not partitioned as base.dummy and therefore there's no partition to move.我对这个错误的stg.inc_labels_neo4jdata是表stg.inc_labels_neo4jdata没有分区为base.dummy ，因此没有要移动的分区。

From Hive documentation :来自Hive 文档：

This statement lets you move the data in a partition from a table to another table that has the same schema and does not already have that partition.此语句允许您将分区中的数据从一个表移动到另一个具有相同架构但尚未具有该分区的表。

You can check the Hive DDL Manual for EXCHANGE PARTITION您可以查看 Hive DDL Manual for EXCHANGE PARTITION

And the JIRA where this feature was added to Hive.以及将此功能添加到 Hive 的JIRA 。 You can read:你可以阅读：

This only works if and have the same field schemas and the same partition by parameters.这仅适用于具有相同字段架构和相同参数分区的情况。 If they do not the command will throw an exception.如果他们不这样做，该命令将引发异常。

You basically need to have exactly the same schema on both source_table and destination_table .您基本上需要在source_table和destination_table上拥有完全相同的架构。

Per your last edit, this is not the case.根据您上次的编辑，情况并非如此。

在配置单元表中交换分区时出错

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-07-05 17:10:07

在配置单元表中交换分区时出错

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-07-05 17:10:07

解决方案1
1 已采纳 2019-07-05 17:10:07