简体   繁体   English

如何 append 2 SQL 列数不同的表?

[英]How to append 2 SQL tables with different number of columns?

I have a table (TableA) in a SQL database stored in a Server, accessible via Microsoft SQL Server Management Studio.我在服务器中存储的 SQL 数据库中有一个表 (TableA),可通过 Microsoft SQL Server Management Studio 访问。

Then I have a Databricks notebook which creates a table (TableB) which is then appended to the one stored in the server (Table A).然后我有一个 Databricks 笔记本,它创建一个表 (TableB),然后将其附加到存储在服务器中的表 (表 A)。

To append TableB to TableA I use spark:到 append TableB 到 TableA 我使用 spark:

df_tableB.write.format("jdbc") \
    .mode('append') \
    .option("url", db_jdbc_url) \
    .option("driver", driver) \
    .option("dbtable", table_name) \
    .option("user", db_user) \
    .option("password", db_password) \
    .save()

This works perfectly if the schema of TableA and TableB is the same.如果 TableA 和 TableB 的架构相同,这将非常有效。 However I find out that now my TableB could have a slightly different schema in particular, there could be additional columns.但是我发现现在我的 TableB 可能有一个稍微不同的模式,特别是可能有额外的列。

Therefore I wonder if there is a mode to append the tables so that all columns in common are appended as they are right now, and the new ones are appended as well displaying maybe "None".因此,我想知道 append 表是否有一种模式,以便所有共有的列都按原样附加,并且新列也被附加,可能显示“无”。 Would you be able to propose a smart and elegant way to achieve my goal?你能提出一个聪明而优雅的方法来实现我的目标吗?

Read the schema of TableA and select only those columns from TableB :只读取TableA和 select 的架构,仅读取TableB中的那些列:

df_tableA = spark.read.format("jdbc").option(...)...load(...)
columns = [F.col(column_name) if column_name in df_tableB.schema.names else F.lit(None).alias(column_name) for column_name in df_tableA.schema.names]
df_tableB.select(columns).write.format("jdbc") \
    .mode('append') \
    .option("url", db_jdbc_url) \
    .option("driver", driver) \
    .option("dbtable", table_name) \
    .option("user", db_user) \
    .option("password", db_password) \
    .save()

This way only columns that are present in TableA are selected and the order will be correct.这样只会选择TableA中存在的列,并且顺序是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM