[英]Append pandas dataframe to existing table in databricks
I want to append a pandas dataframe (8 columns) to an existing table in databricks (12 columns), and fill the other 4 columns that can't be matched with None values.我想将 append 和 pandas dataframe (8 列)添加到数据块中的现有表(12 列)中,并填充其他 4 列可以匹配的值“无”。 Here is I've tried:
这是我试过的:
spark_df = spark.createDataFrame(df)
spark_df.write.mode("append").insertInto("my_table")
It thrown the error:它抛出了错误:
ParseException: "\nmismatched input ':' expecting (line 1, pos 4)\n\n== SQL ==\n my_table
ParseException: "\nmismatched input ':' 期待(第 1 行,第 4 行)\n\n== SQL ==\n my_table
Looks like spark can't handle this operation with unmatched columns, is there any way to achieve what I want?看起来 spark 无法使用不匹配的列处理此操作,有什么方法可以实现我想要的吗?
I think that the most natural course of action would be a select() transformation to add the missing columns to the 8-column dataframe, followed by a unionAll() transformation to merge the two.我认为最自然的做法是 select() 转换将缺失的列添加到 8 列 dataframe,然后是 unionAll() 转换以合并两者。
from pyspark.sql import Row
from pyspark.sql.functions import lit
bigrow = Row(a='foo', b='bar')
bigdf = spark.createDataFrame([bigrow])
smallrow = Row(a='foobar')
smalldf = spark.createDataFrame([smallrow])
fitdf = smalldf.select(smalldf.a, lit(None).alias('b'))
uniondf = bigdf.unionAll(fitdf)
Can you try this你能试试这个
df = spark.createDataFrame(pandas_df)
df_table_struct = sqlContext.sql('select * from my_table limit 0')
for col in set(df_table_struct.columns) - set(df.columns):
df = df.withColumn(col, F.lit(None))
df_table_struct = df_table_struct.unionByName(df)
df_table_struct.write.saveAsTable('my_table', mode='append')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.