简体   繁体   English

Append pandas dataframe 到databricks中的现有表

[英]Append pandas dataframe to existing table in databricks

I want to append a pandas dataframe (8 columns) to an existing table in databricks (12 columns), and fill the other 4 columns that can't be matched with None values.我想将 append 和 pandas dataframe (8 列)添加到数据块中的现有表(12 列)中,并填充其他 4 列可以匹配的值“无”。 Here is I've tried:这是我试过的:

spark_df = spark.createDataFrame(df)
spark_df.write.mode("append").insertInto("my_table")

It thrown the error:它抛出了错误:

ParseException: "\nmismatched input ':' expecting (line 1, pos 4)\n\n== SQL ==\n my_table ParseException: "\nmismatched input ':' 期待(第 1 行,第 4 行)\n\n== SQL ==\n my_table

Looks like spark can't handle this operation with unmatched columns, is there any way to achieve what I want?看起来 spark 无法使用不匹配的列处理此操作,有什么方法可以实现我想要的吗?

I think that the most natural course of action would be a select() transformation to add the missing columns to the 8-column dataframe, followed by a unionAll() transformation to merge the two.我认为最自然的做法是 select() 转换将缺失的列添加到 8 列 dataframe,然后是 unionAll() 转换以合并两者。

from pyspark.sql import Row
from pyspark.sql.functions import lit

bigrow = Row(a='foo', b='bar')
bigdf = spark.createDataFrame([bigrow])
smallrow = Row(a='foobar')
smalldf = spark.createDataFrame([smallrow])

fitdf = smalldf.select(smalldf.a, lit(None).alias('b'))

uniondf = bigdf.unionAll(fitdf)

Can you try this你能试试这个

df = spark.createDataFrame(pandas_df)

df_table_struct = sqlContext.sql('select * from my_table limit 0')

for col in set(df_table_struct.columns) - set(df.columns):
    df = df.withColumn(col, F.lit(None))

df_table_struct = df_table_struct.unionByName(df)

df_table_struct.write.saveAsTable('my_table', mode='append')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM