How to insert data into delta table with changing schema in Databricks.
In Databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table.
Let's say the schema has 4 columns A
, B
, C
, D
.
So, one day 1 I'm loading my dataframe with 4 columns into the delta table using the below code.
loadfinaldf.write.format("delta").option("mergeSchema", "true")\
.mode("append").insertInto("table")
The columns in the dataframe change every day. For instance on day 2, two new columns E
, F
are added and there is no C
column. Now I have 5 columns A
, B
, D
, E
, F
in the dataframe. When I load this data into the delta table, columns E
and F
should be dynamically created in the table schema and the corresponding data should load into these two columns and column C should be populated as NULL. I was assuming that spark.conf.set("spark.databricks.delta.schema.autoMerge","true")
will do the job. But I'm unable to achieve this.
My approach: I was thinking to list the pre-defined delta schema and the dataframe schema and compare both before loading it into the delta table.
Can you use some Python logic?
result = pd.concat([df1, df2], axis=1, join="inner")
Then, push your dataframe into a dynamically created SQL table?
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.