简体   繁体   中英

Data insertion into delta table with changing schema

How to insert data into delta table with changing schema in Databricks.

In Databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table.

Let's say the schema has 4 columns A , B , C , D .

So, one day 1 I'm loading my dataframe with 4 columns into the delta table using the below code.

loadfinaldf.write.format("delta").option("mergeSchema", "true")\
       .mode("append").insertInto("table")

The columns in the dataframe change every day. For instance on day 2, two new columns E , F are added and there is no C column. Now I have 5 columns A , B , D , E , F in the dataframe. When I load this data into the delta table, columns E and F should be dynamically created in the table schema and the corresponding data should load into these two columns and column C should be populated as NULL. I was assuming that spark.conf.set("spark.databricks.delta.schema.autoMerge","true") will do the job. But I'm unable to achieve this.

My approach: I was thinking to list the pre-defined delta schema and the dataframe schema and compare both before loading it into the delta table.

Can you use some Python logic?

result = pd.concat([df1, df2], axis=1, join="inner")

Then, push your dataframe into a dynamically created SQL table?

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM