[英]how do I do a sum over partition in spark sql?
spark sql is just different enough from the engines I use that it breaks all my code spark sql 与我使用的引擎完全不同,它破坏了我的所有代码
this statement这个说法
case when sum(flag = 'Y') over (partition by id) > 0
then 'Y' else 'N' end as flag
is supposed to return Y if any flag field of a given id is Y and it doesn't work because sum function in spark can only take numeric types.如果给定 id 的任何标志字段为 Y 并且它不起作用,则应该返回 Y,因为 spark 中的 sum function 只能采用数字类型。 is there a workaround?有解决方法吗?
Your code is not valid SQL -- it happens to work in MySQL but not in most databases.您的代码无效 SQL - 它恰好在 MySQL 中有效,但在大多数数据库中无效。
The Standard SQL approach will work, using a CASE
expression:使用CASE
表达式,标准 SQL 方法将起作用:
(case when sum(case when flag = 'Y' then 1 else 0 end) over (partition by id) > 0
then 'Y' else 'N'
end) as flag
Or, assuming that flag
only takes on the values of 'Y'
and 'N'
, you can simplify the logic to:或者,假设flag
仅采用'Y'
和'N'
的值,您可以将逻辑简化为:
min(flag) over (partition by id) as flag
You can cast the Boolean flag = 'Y'
to an integer in order to sum it up:您可以将 Boolean flag = 'Y'
转换为 integer 以便总结:
case when sum(int(flag = 'Y')) over (partition by id) > 0
then 'Y' else 'N' end as flag
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.