简体   繁体   English

将数据插入到具有更改架构的增量表中

[英]Data insertion into delta table with changing schema

How to insert data into delta table with changing schema in Databricks.如何在 Databricks 中通过更改架构将数据插入到增量表中。

In Databricks Scala, I'm exploding a Map column and loading it into a delta table.在 Databricks Scala 中,我正在分解 Map 列并将其加载到增量表中。 I have a predefined schema of the delta table.我有一个增量表的预定义架构。

Let's say the schema has 4 columns A , B , C , D .假设架构有 4 列ABCD

So, one day 1 I'm loading my dataframe with 4 columns into the delta table using the below code.因此,第一天,我使用以下代码将具有 4 列的数据框加载到增量表中。

loadfinaldf.write.format("delta").option("mergeSchema", "true")\
       .mode("append").insertInto("table")

The columns in the dataframe change every day.数据框中的列每天都在变化。 For instance on day 2, two new columns E , F are added and there is no C column.例如,在第 2 天,添加了两个新列EF并且没有C列。 Now I have 5 columns A , B , D , E , F in the dataframe.现在我在数据框中有 5 列ABDEF When I load this data into the delta table, columns E and F should be dynamically created in the table schema and the corresponding data should load into these two columns and column C should be populated as NULL.当我将此数据加载到增量表中时,应在表模式中动态创建列EF ,并且应将相应的数据加载到这两列中,并将列 C 填充为 NULL。 I was assuming that spark.conf.set("spark.databricks.delta.schema.autoMerge","true") will do the job.我假设spark.conf.set("spark.databricks.delta.schema.autoMerge","true")可以完成这项工作。 But I'm unable to achieve this.但我无法做到这一点。

My approach: I was thinking to list the pre-defined delta schema and the dataframe schema and compare both before loading it into the delta table.我的方法:我想列出预定义的增量架构和数据帧架构,并在将其加载到增量表之前进行比较。

Can you use some Python logic?你能使用一些 Python 逻辑吗?

result = pd.concat([df1, df2], axis=1, join="inner")

Then, push your dataframe into a dynamically created SQL table?然后,将您的数据帧推送到动态创建的 SQL 表中?

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM