简体   繁体   English

对一列求和的有效方法,其中另一列的 count(value_1) / count(value_2) 大于 x

[英]Efficient way to sum on one column where count(value_1) / count(value_2) of another column is greater than x

I have a table of the following structure:我有一个具有以下结构的表:

| id | bool | amt |
-------------------
| 1  | 0    | 4   |
| 1  | 1    | 3   |
| 1  | 1    | 5   |
| 2  | 0    | 8   |
| 2  | 1    | 4   |
| 2  | 0    | 4   |

I want to get the sum of the amt but only when the the ratio of bool = 1 / bool = 0 per id is greater than 0.6.我想得到amt的总和,但只有当bool = 1 / bool = 0 per id的比率大于 0.6 时。

I have successfully done this like this:我已经成功地做到了这一点:

SELECT SUM(amt) as total_amt,
FROM table
WHERE id IN (
    SELECT id 
    FROM table 
    GROUP BY id 
    HAVING CAST(SUM(bool) AS DOUBLE) / CAST(COUNT(bool) AS DOUBLE) > 0.6
)

However, my problem is that this is a toy simulation of my actual tables and data, and in reality it is a very large amount of data.但是,我的问题是,这是对我的实际表和数据的玩具模拟,实际上是非常大量的数据。 When I run this query on all my data, I get errors either saying that the memory limit of the cluster has been reached, or that the execution time has reached the limit.当我对所有数据运行此查询时,我收到错误消息,指出已达到集群的内存限制,或执行时间已达到限制。 If I remove the WHERE statement which finds the id s satisfying the ratio, then it runs without errors.如果我删除找到满足比率的idWHERE语句,那么它运行时不会出错。

Before resorting to having these limits increased, is there any way I can achieve this more efficiently, either in terms of memory, execution time, or both?在诉诸增加这些限制之前,有什么方法可以更有效地实现这一目标,无论是在内存、执行时间还是两者方面?

You can use two levels of aggregation:您可以使用两个级别的聚合:

select sum(id_amount)
from (select id, sum(amount) as id_amount,
             avg(case when bool then 1.0 else 0 end) as ratio
      from t
      group by id
     ) t
where ratio > 0.6;

Note: I don't have much experience with Presto.注意:我对 Presto 没有太多经验。 I think you can use:我认为你可以使用:

avg(bool)

or:或者:

avg(bool::int)

instead of the above expression.而不是上面的表达式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 MySQL 计算列值相同的行,并选择计数大于 2 的行 - MySQL count rows where column value is same and select them where count is greater than 2 SQL计算计数大于列值 - Sql calculate count greater than column value SQL对值大于10的所有列进行计数并求和 - SQL count all of the column that has a value greater than 10 and sum it up 如何使用SQL对一列的值&lt;= x和另一列&gt; x的值的行数进行分组和计数? - How can I use SQL to group and count the number of rows where the value for one column is <= x and the value for another column > x? 如何对值为X的列执行计数 - how to perform a count on a column where a value is X 选择“ Count of 1”字段的值大于一个值,而另一个字段的值小于一个值 - Select where Count of 1 field is greater than a value and count of another is less than a value 如果列的计数值大于 1,我想打印列的计数,否则我想在字段中打印值 - if count value of the column is greater than 1, I want to print the count of the column else I want to print value in the field 检索具有相同 ID 的记录,其中一个值在列中的出现次数大于另一个值的出现次数 - Retrieving records with the same ID where the number of occurrences of one value in a column is greater than occurrences of another value SQL - 计算一列值在另一列中出现的次数 - SQL - Count the occurence of one column value in another 如果count大于1,则替换第二个值 - Replace second value if count is greater than one
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM