Insert spark Dataframe in partitioned hive table without overwrite the data

Question

I have one dataframe created from a partition table.

I need to insert this data frame in an already created partitioned hive table without overwriting the previous data.

I use partitionBy("columnname"),insertInto("hivetable") but it give me issue of partitionBy and intsertInto cant use at same time.

Answer 1

You can't do partitionBy with the insertInto operator. PartitionBy partitions the existing data into multiple hive partitions. The insertInto is used to insert data into a predefined partition.

Therefore, You can do something like this

spark.range(10)
.withColumn("p1", 'id % 2)
.write
.mode("overwrite")
.partitionBy("p1")
.saveAsTable("partitioned_table")

val insertIntoQ = sql("INSERT INTO TABLE 
partitioned_table PARTITION (p1 = 4) VALUES 41, 42")

If you require partitions to be added dynamically then you would need to set the hive.exec.dynamic.partition .

hiveContext.setConf("hive.exec.dynamic.partition", "true")

hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

I faced similar problem during data ingestion, I did something like

df.write().mode(SaveMode.Append).partitionBy("colname").saveAsTable("Table")

Answer 2

When you use insertInto there is no need to add PartitionBy or BucketBy in the code. This should be defined in the table creation request.

Insert spark Dataframe in partitioned hive table without overwrite the data

Question

2 answers

solution1
0 2019-09-25 16:12:13

solution2
0 2022-02-09 14:15:51

Insert spark Dataframe in partitioned hive table without overwrite the data

Question

2 answers

solution1 0 2019-09-25 16:12:13

solution2 0 2022-02-09 14:15:51

solution1
0 2019-09-25 16:12:13

solution2
0 2022-02-09 14:15:51