Data insertion into delta table with changing schema

Question

How to insert data into delta table with changing schema in Databricks.

In Databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table.

Let's say the schema has 4 columns A , B , C , D .

So, one day 1 I'm loading my dataframe with 4 columns into the delta table using the below code.

loadfinaldf.write.format("delta").option("mergeSchema", "true")\
       .mode("append").insertInto("table")

The columns in the dataframe change every day. For instance on day 2, two new columns E , F are added and there is no C column. Now I have 5 columns A , B , D , E , F in the dataframe. When I load this data into the delta table, columns E and F should be dynamically created in the table schema and the corresponding data should load into these two columns and column C should be populated as NULL. I was assuming that spark.conf.set("spark.databricks.delta.schema.autoMerge","true") will do the job. But I'm unable to achieve this.

My approach: I was thinking to list the pre-defined delta schema and the dataframe schema and compare both before loading it into the delta table.

Answer 1

Can you use some Python logic?

result = pd.concat([df1, df2], axis=1, join="inner")

Then, push your dataframe into a dynamically created SQL table?

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html

Data insertion into delta table with changing schema

Question

1 answers

solution1
0 2021-11-06 15:41:28

Data insertion into delta table with changing schema

Question

1 answers

solution1 0 2021-11-06 15:41:28

solution1
0 2021-11-06 15:41:28