简体   繁体   English

在配置单元表中交换分区时出错

[英]Error while exchanging partition in hive tables

I am trying to merge the incremental data with an existing hive table.我正在尝试将增量数据与现有的配置单元表合并。

For testing I created a dummy table from the base table as below:为了测试,我从基表创建了一个虚拟表,如下所示:

create base.dummytable like base.fact_table

The table: base.fact_table is partition based on dbsource String When I checked the dummy table's DDL, I could see that the partition column is correctly defined.表:base.fact_table 是基于dbsource String分区当我检查虚拟表的 DDL 时,我可以看到分区列已正确定义。

PARTITIONED BY (                                                 |
|   `dbsource` string)

Then I tried to exchange one of the partition from the dummy table by dropping it first.然后我尝试通过先删除它来交换虚拟表中的一个分区。

spark.sql("alter table base.dummy drop partition(dbsource='NEO4J')")

The partition: NEO4J has dropped successfully and I ran the exchange statement as below:分区:NEO4J 已成功删除,我运行了如下交换语句:

spark.sql("ALTER TABLE base.dummy EXCHANGE PARTITION (dbsource = 'NEO4J') WITH TABLE stg.inc_labels_neo4jdata")

The exchange statement is giving an error: exchange 语句报错:

Error: Error while compiling statement: FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {dbsource=NEO4J}

The table I am trying to push the incremental data is partitioned by dbsource and I have dropped it successfully.我尝试推送增量数据的表由dbsource分区,我已成功删除它。 I am running this from spark code and the config is given below:我从火花代码运行它,配置如下:

  val conf = new SparkConf().setAppName("MERGER").set("spark.executor.heartbeatInterval", "120s")
      .set("spark.network.timeout", "12000s")
      .set("spark.sql.inMemoryColumnarStorage.compressed", "true")
      .set("spark.shuffle.compress", "true")
      .set("spark.shuffle.spill.compress", "true")
      .set("spark.sql.orc.filterPushdown", "true")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .set("spark.kryoserializer.buffer.max", "512m")
      .set("spark.serializer", classOf[org.apache.spark.serializer.KryoSerializer].getName)
      .set("spark.streaming.stopGracefullyOnShutdown", "true")
      .set("spark.dynamicAllocation.enabled", "false")
      .set("spark.shuffle.service.enabled", "true")
      .set("spark.executor.instances", "4")
      .set("spark.executor.memory", "4g")
      .set("spark.executor.cores", "5")
      .set("hive.merge.sparkfiles","true")
      .set("hive.merge.mapfiles","true")
      .set("hive.merge.mapredfiles","true")

show create table base.dummy:显示创建表 base.dummy:

CREATE TABLE `base`.`dummy`(
`dff_id` bigint, 
`dff_context_id` bigint,  
`descriptive_flexfield_name` string,  
`model_table_name` string)
 PARTITIONED BY (`dbsource` string)
  ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
 STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
 OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION  
'/apps/hive/warehouse/base.db/dummy'
 TBLPROPERTIES ( 
'orc.compress'='ZLIB')

show create table stg.inc_labels_neo4jdata:显示创建表 stg.inc_labels_neo4jdata:

CREATE TABLE `stg`.`inc_labels_neo4jdata`(
`dff_id` bigint, 
`dff_context_id` bigint,  
`descriptive_flexfield_name` string,  
`model_table_name` string)
`dbsource` string)
  ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
 STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
 OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION  
'/apps/hive/warehouse/stg.db/inc_labels_neo4jdata'
 TBLPROPERTIES ( 
'orc.compress'='ZLIB')

Could anyone let me know what the mistake I am doing here & what should I change inorder to successfully exchange the partition ?任何人都可以让我知道我在这里做的错误是什么以及我应该更改什么才能成功交换分区?

My take on this error is that table stg.inc_labels_neo4jdata is not partitioned as base.dummy and therefore there's no partition to move.我对这个错误的stg.inc_labels_neo4jdata是表stg.inc_labels_neo4jdata没有分区为base.dummy ,因此没有要移动的分区。

From Hive documentation :来自Hive 文档

This statement lets you move the data in a partition from a table to another table that has the same schema and does not already have that partition.此语句允许您将分区中的数据从一个表移动到另一个具有相同架构但尚未具有该分区的表。

You can check the Hive DDL Manual for EXCHANGE PARTITION您可以查看 Hive DDL Manual for EXCHANGE PARTITION

And the JIRA where this feature was added to Hive.以及将此功能添加到 Hive 的JIRA You can read:你可以阅读:

This only works if and have the same field schemas and the same partition by parameters.这仅适用于具有相同字段架构和相同参数分区的情况。 If they do not the command will throw an exception.如果他们不这样做,该命令将引发异常。

You basically need to have exactly the same schema on both source_table and destination_table .您基本上需要在source_tabledestination_table上拥有完全相同的架构。

Per your last edit, this is not the case.根据您上次的编辑,情况并非如此。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将DataFrame加载到Hive分区时找不到表错误 - Table not found error while loading DataFrame into a Hive partition Hive,如何删除分区,编译语句时出错:失败:期望在删除分区语句中设置 null - Hive, how to drop partition, Error while compiling statement: FAILED: expecting set null in drop partition statement 使用Hive表时Spark提交引发错误 - Spark submit throws error while using Hive tables 在Hive中选择数据时,如果外部表分区位置数据丢失,如何通过错误? - In Hive how to through error if external table partition location data is missing while selecting data? 配置单元-插入表分区时抛出错误 - Hive - insert into table partition throwing error 如何手动将分区详细信息添加到 hive Metastore 表中? - How to manually add partition details into hive metastore tables? Null 在 Hive CLI 上执行“alter table table_name drop partition(part_column &lt; value)”时出现指针错误 - Null Pointer Error while doing “alter table table_name drop partition(part_column < value)” on Hive CLI 从Hive数据按分区计算平均值时发生意外的随机播放 - Unexpected Shuffle while Calculating Mean by Partition from Hive Data spark-hive-向动态分区配置单元表中更新会引发错误-分区规范包含非分区列 - spark-hive - Upsert into dynamic partition hive table throws an error - Partition spec contains non-partition columns 在 pyspark 中使用 partition by 子句时出错 - Error while using partition by clause in pyspark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM