[英]Error while exchanging partition in hive tables
I am trying to merge the incremental data with an existing hive table.我正在尝试将增量数据与现有的配置单元表合并。
For testing I created a dummy table from the base table as below:为了测试,我从基表创建了一个虚拟表,如下所示:
create base.dummytable like base.fact_table
The table: base.fact_table is partition based on dbsource String
When I checked the dummy table's DDL, I could see that the partition column is correctly defined.表:base.fact_table 是基于dbsource String
分区当我检查虚拟表的 DDL 时,我可以看到分区列已正确定义。
PARTITIONED BY ( |
| `dbsource` string)
Then I tried to exchange one of the partition from the dummy table by dropping it first.然后我尝试通过先删除它来交换虚拟表中的一个分区。
spark.sql("alter table base.dummy drop partition(dbsource='NEO4J')")
The partition: NEO4J has dropped successfully and I ran the exchange statement as below:分区:NEO4J 已成功删除,我运行了如下交换语句:
spark.sql("ALTER TABLE base.dummy EXCHANGE PARTITION (dbsource = 'NEO4J') WITH TABLE stg.inc_labels_neo4jdata")
The exchange statement is giving an error: exchange 语句报错:
Error: Error while compiling statement: FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {dbsource=NEO4J}
The table I am trying to push the incremental data is partitioned by dbsource
and I have dropped it successfully.我尝试推送增量数据的表由dbsource
分区,我已成功删除它。 I am running this from spark code and the config is given below:我从火花代码运行它,配置如下:
val conf = new SparkConf().setAppName("MERGER").set("spark.executor.heartbeatInterval", "120s")
.set("spark.network.timeout", "12000s")
.set("spark.sql.inMemoryColumnarStorage.compressed", "true")
.set("spark.shuffle.compress", "true")
.set("spark.shuffle.spill.compress", "true")
.set("spark.sql.orc.filterPushdown", "true")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryoserializer.buffer.max", "512m")
.set("spark.serializer", classOf[org.apache.spark.serializer.KryoSerializer].getName)
.set("spark.streaming.stopGracefullyOnShutdown", "true")
.set("spark.dynamicAllocation.enabled", "false")
.set("spark.shuffle.service.enabled", "true")
.set("spark.executor.instances", "4")
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "5")
.set("hive.merge.sparkfiles","true")
.set("hive.merge.mapfiles","true")
.set("hive.merge.mapredfiles","true")
show create table base.dummy:显示创建表 base.dummy:
CREATE TABLE `base`.`dummy`(
`dff_id` bigint,
`dff_context_id` bigint,
`descriptive_flexfield_name` string,
`model_table_name` string)
PARTITIONED BY (`dbsource` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/apps/hive/warehouse/base.db/dummy'
TBLPROPERTIES (
'orc.compress'='ZLIB')
show create table stg.inc_labels_neo4jdata:显示创建表 stg.inc_labels_neo4jdata:
CREATE TABLE `stg`.`inc_labels_neo4jdata`(
`dff_id` bigint,
`dff_context_id` bigint,
`descriptive_flexfield_name` string,
`model_table_name` string)
`dbsource` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/apps/hive/warehouse/stg.db/inc_labels_neo4jdata'
TBLPROPERTIES (
'orc.compress'='ZLIB')
Could anyone let me know what the mistake I am doing here & what should I change inorder to successfully exchange the partition ?任何人都可以让我知道我在这里做的错误是什么以及我应该更改什么才能成功交换分区?
My take on this error is that table stg.inc_labels_neo4jdata
is not partitioned as base.dummy
and therefore there's no partition to move.我对这个错误的stg.inc_labels_neo4jdata
是表stg.inc_labels_neo4jdata
没有分区为base.dummy
,因此没有要移动的分区。
From Hive documentation :来自Hive 文档:
This statement lets you move the data in a partition from a table to another table that has the same schema and does not already have that partition.此语句允许您将分区中的数据从一个表移动到另一个具有相同架构但尚未具有该分区的表。
You can check the Hive DDL Manual for EXCHANGE PARTITION您可以查看 Hive DDL Manual for EXCHANGE PARTITION
And the JIRA where this feature was added to Hive.以及将此功能添加到 Hive 的JIRA 。 You can read:你可以阅读:
This only works if and have the same field schemas and the same partition by parameters.这仅适用于具有相同字段架构和相同参数分区的情况。 If they do not the command will throw an exception.如果他们不这样做,该命令将引发异常。
You basically need to have exactly the same schema on both source_table
and destination_table
.您基本上需要在source_table
和destination_table
上拥有完全相同的架构。
Per your last edit, this is not the case.根据您上次的编辑,情况并非如此。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.