简体   繁体   English

使用带有 jdbc write 的 PySpark Dataframe 在 PostgreSQL 上编写 enun 的问题

[英]Problem writting an enun on PostgreSQL using a PySpark Dataframe with jdbc write

So I am moving data from a MySQL (5.7) database to a PostgreSQL (12.7) one using PySpark (Spark 3.0.1, Scala 2.12).因此,我使用 PySpark(Spark 3.0.1,Scala 2.12)将数据从 MySQL (5.7) 数据库移动到 PostgreSQL (12.7) 数据库。 A table from the destiny model has an column that is an Enum.命运模型中的表有一个 Enum 列。

CREATE TYPE ORDER_STATUS AS ENUM (
'SHIPPED','PAID','REFUNDED','PARTIALLY_REFUNDED','PROCESSING');

When inserting:插入时:

df_orders.select(df_orders.columns).write.format('jdbc').options(**postgres_write_opts_table).mode('append').save()

I am getting the next exception我得到下一个例外

Caused by: org.postgresql.util.PSQLException: ERROR: column "status" is of type order_status but expression is of type character varying
Hint: You will need to rewrite or cast the expression.

Basically I need to cast the column status to ORDER_STATUS .基本上我需要将列状态转换为ORDER_STATUS I have tried to use a UserDefinedType (PySpark does no have SQLUserDefinedType ) but no really knowing what I am doing because the documentation is not very clear.我曾尝试使用UserDefinedType (PySpark 没有SQLUserDefinedType )但不知道我在做什么,因为文档不是很清楚。

class StatusUDT(UserDefinedType):
@classmethod
def sqlType(self):
    return NullType()

@classmethod
def module(cls):
    return cls.__module__

def serialize(self, obj):
    return f"{obj.value}::order_status_type"

def deserialize(self, datum):
    return {x.value: x for x in Some}[datum]

And then I try the casting然后我尝试铸造

df_orders = df_orders.withColumn("status", col("status").cast(StatusUDT()))

Then I am getting the next error then:然后我收到下一个错误:

AnalysisException: cannot resolve 'CAST(`status` AS NULL)' due to data type mismatch: cannot cast string to null;;

Is there any way to cast this Enum?有什么办法可以投射这个 Enum 吗?

So I finally was able to overcome this issue.所以我终于能够克服这个问题。 I temporally removed the Enum so I could keep doing more tests and the I had a similar issue with a JSON type.我暂时删除了 Enum 以便我可以继续进行更多测试,并且我在 JSON 类型方面遇到了类似的问题。 Searching about it I found this post: How to save String as JSONB type in postgres when using AWS Glue .搜索它我发现了这篇文章: How to save String as JSONB type in postgres when using AWS Glue And I fixed it setting the property:我修复了它设置属性:

'stringtype':"unspecified"

as the post answer suggests.正如帖子答案所暗示的那样。

Then I put back the Enum into the table and this property also worked.然后我把 Enum 放回表中,这个属性也有效。 I was able to run the insertions with no further issues.我能够在没有其他问题的情况下运行插入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM