简体   繁体   English

带有多个 case when 语句的 Spark-sql

[英]Spark-sql with multiple case when statements

I have created one temporary table using my dataframe in sparksql using mydf.createOrReplaceTempView("combine_table").All the fields datatype is showing as string.我使用 mydf.createOrReplaceTempView("combine_table") 在 sparksql 中使用我的数据框创建了一个临时表。所有字段数据类型都显示为字符串。 In this temp table I have 4 columns procuredValue,minMargin,maxMargin,Price and some other columns.In all these 4 columns i have values like 373.58...etc.在这个临时表中,我有 4 列 procuredValue、minMargin、maxMargin、Price 和其他一些列。在所有这 4 列中,我都有 373.58...等值。 Now I need to select data based on some condition and have to display data as new column ."Final Price".现在我需要根据某些条件选择数据,并且必须将数据显示为新列“最终价格”。 I am trying to do this using Case statement but getting below error.我正在尝试使用 Case 语句执行此操作,但出现以下错误。 mismatched input '1st_case' expecting EOF(line 3, pos 5) can anyone suggest how should i do this.不匹配的输入 '1st_case' 期望 EOF(第 3 行,位置 5) 谁能建议我应该如何执行此操作。

    val d1=spark.sql(""" SELECT cast(PV as  FloatType),cast(mxM as FloatType),
    cast(mnM as FloatType ) , cast(procuredValue+ mxM as FloatType) as 1st_case, 
    cast(PV+ mnM as FloatType) as 2nd_case,
    case 
    WHEN 1st_case < price THEN 1st_case
    WHEN 2ndcse < price THEN 2ndcse 
    WHEN PV <price && saleevent = 'Sp' THEN 'price'
    WHEN price < 'PV'  && saleevent = 'Sp' && sclass = 'VH' THEN 0.9* PV
    ELSE PV 
    END AS Final_price 
    FROM combine_table""")

What happens to your query?您的查询会发生什么?

SELECT *, 
       CASE 
              WHEN Sum(i.procuredvalue + i.maxmargin) < min_val_seller.q THEN Sum(i.procuredvalue + i.maxmargin)
              WHEN Sum(i.procuredvalue + i.maxmargin) < min_val_seller.q THEN min_val_seller.q 
              WHEN Sum(i.procuredvalue < min_val_seller.q) and e.saleevent = 'Special' THEN min_val_seller.q
              WHEN min_val_seller.q < i.procuredvalue and e.saleevent = 'Special' and Min(min_val_seller.q) and s.netvalue = 'VeryHigh' THEN 0.9*i.procuredvalue
              ELSE i.procuredvalue 
       END AS final_price 
  FROM ecom_competitor_data e, 
       internal_product_data i, 
       min_val_seller, 
       seller_data s 
 WHERE e.productid = i.productid 
   AND s.sellerid = i.sellerid

So many issues...这么多问题...

  1. You cannot surround your query with double quotes and also use double quotes to qualify strings within it (without the proper escaping).您不能用双引号将查询括起来,也不能使用双引号来限定其中的字符串(没有适当的转义)。
    The easier and cleaner solution would be to do something like that:更简单、更干净的解决方案是做这样的事情:

- ——

val myquery = """
select ...
from ...
where ...
"""   

val 1st_case=spark.sql(myquery)

PS get used to use a single quote for SQL strings. PS 习惯于对 SQL 字符串使用单引号。 It would work for all SQL dialects, unlike double quotes.与双引号不同,它适用于所有 SQL 方言。

  1. 'min_val_seller.Q' is a string literal 'min_val_seller.Q'是一个字符串文字

  2. The logical AND in Spark is and , not && Spark 中的逻辑 AND 是and ,而不是&&

  3. The CASE statement starts with two identical conditions ( Sum(i.procuredvalue + i.maxmargin) < min_val_seller.q ). CASE 语句以两个相同的条件( Sum(i.procuredvalue + i.maxmargin) < min_val_seller.q )开始。 The 2nd condition will never be chosen.永远不会选择第二个条件。

    (please make sure you understand how CASE works) (请确保您了解 CASE 的工作原理)

  4. ISO JOINs were introduced in the 90`s. ISO JOIN 是在 90 年代引入的。 There is no reason to use WHERE conditions instead of proper JOIN syntax.没有理由使用 WHERE 条件而不是正确的 JOIN 语法。

    val d1=spark.sql(""" SELECT price,PV,
    case
    WHEN cast(PV + mxM as Float) < cast(price as Float) THEN PV + mxM
    WHEN cast(PV + mnM as Float) < cast(price  as Float)THEN PV + mnM
    WHEN cast(PV as Float) < cast(price  as Float) And saleevent =     'Sp' THEN price
    WHEN cast(price as Float) < cast(PV as Float)  And saleevent =   'Sp' And sclass = "VH" THEN 0.9*PV
    ELSE PV
    END AS price
    FROM combine_table""");

Thanks @David Above Query Worked for me.谢谢@David 以上查询对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM