对一列求和的有效方法，其中另一列的 count(value_1) / count(value_2) 大于 x

Question

I have a table of the following structure:我有一个具有以下结构的表：

| id | bool | amt |
-------------------
| 1  | 0    | 4   |
| 1  | 1    | 3   |
| 1  | 1    | 5   |
| 2  | 0    | 8   |
| 2  | 1    | 4   |
| 2  | 0    | 4   |

I want to get the sum of the amt but only when the the ratio of bool = 1 / bool = 0 per id is greater than 0.6.我想得到amt的总和，但只有当bool = 1 / bool = 0 per id的比率大于 0.6 时。

I have successfully done this like this:我已经成功地做到了这一点：

SELECT SUM(amt) as total_amt,
FROM table
WHERE id IN (
    SELECT id 
    FROM table 
    GROUP BY id 
    HAVING CAST(SUM(bool) AS DOUBLE) / CAST(COUNT(bool) AS DOUBLE) > 0.6
)

However, my problem is that this is a toy simulation of my actual tables and data, and in reality it is a very large amount of data.但是，我的问题是，这是对我的实际表和数据的玩具模拟，实际上是非常大量的数据。 When I run this query on all my data, I get errors either saying that the memory limit of the cluster has been reached, or that the execution time has reached the limit.当我对所有数据运行此查询时，我收到错误消息，指出已达到集群的内存限制，或执行时间已达到限制。 If I remove the WHERE statement which finds the id s satisfying the ratio, then it runs without errors.如果我删除找到满足比率的id的WHERE语句，那么它运行时不会出错。

Before resorting to having these limits increased, is there any way I can achieve this more efficiently, either in terms of memory, execution time, or both?在诉诸增加这些限制之前，有什么方法可以更有效地实现这一目标，无论是在内存、执行时间还是两者方面？

Answer 1

You can use two levels of aggregation:您可以使用两个级别的聚合：

select sum(id_amount)
from (select id, sum(amount) as id_amount,
             avg(case when bool then 1.0 else 0 end) as ratio
      from t
      group by id
     ) t
where ratio > 0.6;

Note: I don't have much experience with Presto.注意：我对 Presto 没有太多经验。 I think you can use:我认为你可以使用：

avg(bool)

or:或者：

avg(bool::int)

instead of the above expression.而不是上面的表达式。

对一列求和的有效方法，其中另一列的 count(value_1) / count(value_2) 大于 x

问题描述

1 个解决方案

解决方案1
0 2020-03-31 12:15:07

对一列求和的有效方法，其中另一列的 count(value_1) / count(value_2) 大于 x

问题描述

1 个解决方案

解决方案1 0 2020-03-31 12:15:07

解决方案1
0 2020-03-31 12:15:07