简体   繁体   English

将 PySpark Dataframe 写入雪花表

[英]Writing PySpark Dataframe to Snowflake Table

I creating jobs using AWS Glue to write spark data frames to a snowflake table.我使用 AWS Glue 创建作业,将 Spark 数据帧写入雪花表。 The results are inconsistent.结果不一致。 Basically, if I clone an existing successful glue job, then change the inputs so the job will write to a new table, it will succeed and I will have a new table in snowflake.基本上,如果我克隆一个现有的成功粘合作业,然后更改输入以便该作业将写入一个新表,它将成功并且我将在雪花中有一个新表。 However, if I try to run the same job again (because we are in development) it will fail with this message:但是,如果我尝试再次运行相同的作业(因为我们正在开发中),它将失败并显示以下消息:

Error occurred while loading files to Snowflake: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error: syntax error line 1 at position 44 unexpected ')'.

This is the line of code that is causing the issue:这是导致问题的代码行:

spark._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(spark._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())
sfOptions = {
"sfURL" : args['URL'],
"sfUser" : args['USERNAME'],
"sfPassword" : args['PASSWORD'],
"sfDatabase" : args['DB'],
"sfSchema" : args['SCHEMA'],
"sfWarehouse" : args['WAREHOUSE'],
"truncate_table" : "off"
}

df = select_columns.toDF()
df = df.select([col(c).cast("string") for c in df.columns])

df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("dbtable", snowflake_table).mode("overwrite").save()

snowflake_table is a variable. snowflake_table 是一个变量。

As you can see, I am trying to write in "overwrite" mode.如您所见,我正在尝试以“覆盖”模式写入。 Which should just drop the existing table and replace it with the spark data frame.应该只删除现有表并将其替换为 spark 数据框。 There is some weird config thing going on between GLUE, spark, and snowflake but it doesn't make any sense, because like I said, I can get this to run if I start fresh from a new glue job, it is when I run it again that this job fails.在 GLUE、火花和雪花之间发生了一些奇怪的配置,但这没有任何意义,因为就像我说的,如果我从新的胶水工作开始,我可以让它运行,那就是我运行的时候再次表明这项工作失败了。

It appears that you have to either reset the job bookmark if it is enabled.如果已启用,您似乎必须重置作业书签。 Or you should disable the bookmark altogether to run the job under these conditions.或者您应该完全禁用书签以在这些条件下运行作业。

If someone smarter could explain why you have to do this, that would be great.如果更聪明的人可以解释为什么你必须这样做,那就太好了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM