比较两个增量表的架构

Question

Need suggestions on how to compare the schema of two delta tables X and Y. Table Y has N more columns than X. How do I dynamically identify the extra columns and add to table X?需要关于如何比较两个增量表 X 和 Y 的架构的建议。表 Y 的列比 X 多 N。如何动态识别额外的列并将其添加到表 X？

In databricks/python Thanks in advance在 databricks/python 中提前致谢

Answer 1

You can achieve it with something like this:你可以用这样的东西来实现它：

def add_missing_columns(df1, df2):
  additional_cols = [F.lit(None).cast(field.dataType).alias(field.name) 
                     for field in df2.schema.fields if field.name not in df1.columns]
  return df1.select("*", *additional_cols)

Usage:用法：

df1 = spark.createDataFrame([('1',), ('2',)], ["col1"])
df2 = spark.createDataFrame([('{1',1, 0.5)], ["col1", "col2", "col3"])
add_missing_columns(df1, df2).show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|null|null|
|   2|null|null|
+----+----+----+

How it works - it iterates over the columns of 2nd dataframe, and check if it's already in the first dataframe or not.它是如何工作的 - 它遍历第二个数据帧的列，并检查它是否已经在第一个数据帧中。 If it's not, it creates a new column with null value, but casted to correct data type.如果不是，它会创建一个具有空值的新列，但会强制转换为正确的数据类型。

比较两个增量表的架构

问题描述

1 个解决方案

解决方案1
0 2021-07-16 11:33:57

比较两个增量表的架构

问题描述

1 个解决方案

解决方案1 0 2021-07-16 11:33:57

解决方案1
0 2021-07-16 11:33:57