简体   繁体   English

PYSPARK:如何将带有多个 case 语句的 SQL 查询转换为 Pyspark/Pyspark-SQL?

[英]PYSPARK : How to covert SQL query with multiple case statements to Pyspark/Pyspark-SQL?

I have two set of queries with multiple case statements.我有两组带有多个案例语句的查询。 I need to achieve the same logic in pyspark.我需要在 pyspark 中实现相同的逻辑。 I tried but I'm facing some difficulties with multiple when.我试过了,但我在多个时间遇到了一些困难。 Any help would be appreciatable.任何帮助将不胜感激。

FIRST QUERY第一个查询

case
when appointment_date is null
then 0
when resolution_desc in (
'CSTXCL - OK BY PHONE'
)
or resolution_des ilike '%NO VAN ROLL%'
then 0
when status in ('PENDING','CANCELLED')
then 0
when ticket_type = 'install'
and appointment_required is true
end as truck_roll

SECOND QUERY第二个查询

case when status = 'COMPLETED'  and resolution not in ('CANCELLING ORDER','CANCEL ORDER')
then 1 else 0 end as completed, 
case when status = 'CANCELLED'  or ( status in ('COMPLETED','PENDING' ) and resolution_desc in ('CANCELLING ORDER','CANCEL ORDER') ) then 1 else 0 end as cancelled.

I tried the below code for second query but not working:我尝试了以下代码进行第二次查询但不工作:

sparkdf.withColumn('completed', f.when((sparkdf.ticket_status =='COMPLETED') & (~sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))\
.withColumn('cancelled', f.when((sparkdf.ticket_status == 'CANCELLED') | (sparkdf.ticket_status.isin('COMPLETED','PENDING')) & (sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))

You can make use of "expr" function to execute SQL code (in this case with triple quotes because it is multi-line):您可以使用 "expr" function 来执行 SQL 代码(在这种情况下使用三引号,因为它是多行的):

from pyspark.sql.functions import expr

sparkdf.withColumn(
    'completed',
    expr('''
           CASE WHEN status = 'COMPLETED' 
                  AND resolution NOT IN ('CANCELLING ORDER',
                                         'CANCEL ORDER') THEN 1 
                ELSE                                          0 
           END
         '''
        )
)

Of course, you would do the same for the "cancelled" column当然,你会为“取消”列做同样的事情

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM