简体   繁体   English

具有多个条件时的 Spark Scala 案例

[英]Spark Scala case when with multiple conditions

I'm trying to do a case on a DF I have but I'm getting an error.我正在尝试对我拥有的 DF 进行处理,但出现错误。 I want to implement this with built in spark functions - withcolumn, when, otherwise:我想用内置的火花函数来实现这个 - withcolumn,when,否则:

CASE WHEN vehicle="BMW" 
AND MODEL IN ("2020","2019","2018","2017") 
AND value> 100000 THEN 1
ELSE 0 END AS NEW_COLUMN

Currently I have this目前我有这个

DF.withColumn(NEW_COLUMN, when(col(vehicle) === "BMW" 
and col(model) isin(listOfYears:_*) 
and col(value) > 100000, 1).otherwise(0))

But I'm getting an error due to data type mismatch, (boolean and string)... I understand my condition returns booleans and strings, which is causing the error.但是由于数据类型不匹配(布尔值和字符串),我收到了一个错误……我知道我的条件返回布尔值和字符串,这是导致错误的原因。 What's the correct syntax for executing a case like that one?执行这样的案例的正确语法是什么? also, I was using && instead of and but the third && was giving me a "cannot resolve symbol &&"另外,我使用 && 而不是and但第三个 && 给了我一个“无法解析符号&&”

Thanks for the help!谢谢您的帮助!

I think && is correct - with the built-in spark functions, all of the expressions are of type Column , checking the API it looks like && is correct and should work fine.我认为 && 是正确的 - 使用内置的 spark 函数,所有表达式都是Column类型,检查 API 看起来&&是正确的并且应该可以正常工作。 Could it be as simple as an order-of-operations issue, where you need parentheses around each of the boolean conditions?是否可以像操作顺序问题一样简单,您需要在每个布尔条件周围加上括号? The function / "operator" isin would have a lower precedence than && , which might trip things up.函数/“运算符” isin优先级低于&& ,这可能会导致问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM