I have one dataframe created from a partition table.
I need to insert this data frame in an already created partitioned hive table without overwriting the previous data.
I use partitionBy("columnname"),insertInto("hivetable")
but it give me issue of partitionBy and intsertInto cant use at same time.
You can't do partitionBy
with the insertInto
operator. PartitionBy partitions the existing data into multiple hive partitions. The insertInto
is used to insert data into a predefined partition.
Therefore, You can do something like this
spark.range(10)
.withColumn("p1", 'id % 2)
.write
.mode("overwrite")
.partitionBy("p1")
.saveAsTable("partitioned_table")
val insertIntoQ = sql("INSERT INTO TABLE
partitioned_table PARTITION (p1 = 4) VALUES 41, 42")
If you require partitions to be added dynamically then you would need to set the hive.exec.dynamic.partition
.
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
I faced similar problem during data ingestion, I did something like
df.write().mode(SaveMode.Append).partitionBy("colname").saveAsTable("Table")
When you use insertInto there is no need to add PartitionBy or BucketBy in the code. This should be defined in the table creation request.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.