[英]How to calculate cumulative product in clickhouse
在 python 计算累积产品,我可以使用numpy.cumprod
>>> a = [2, 3, 4, 5]
>>> numpy.cumprod(a)
[2, 6, 24, 120]
# this is result i want. [2, 3, 4, 5] => [2, 2*3, 2*3*4, 2*3*4*5] => [2, 6, 24, 120]
但我不知道如何在 CLICKHOUSE 中编写sql
表 A:
row rate
1 2
2 3
3 4
4 5
列率是我想要的结果,如何使用CLICKHOUSE SQL语句实现。
row rate
1 2
2 6
3 24
4 120
没有简单的方法。 我会在 CH arrayCumProd 中添加(实现)一个新的 function
SELECT
i, pow(2,arraySum(z->log2(z),n))
FROM
(
SELECT
ig,
arrayMap( i -> arraySlice(ng, 1, i), arrayEnumerate(groupArray(x) AS ng) as ig) xx
FROM ( SELECT arrayJoin([2, 3, 4, 5]) AS x )
)
ARRAY JOIN
ig AS i,
xx AS n
┌─i─┬─pow(2, arraySum(lambda(tuple(z), log2(z)), n))─┐
│ 1 │ 2 │
│ 2 │ 6 │
│ 3 │ 24 │
│ 4 │ 119.99999999999994 │
└───┴────────────────────────────────────────────────┘
嗯,看来我过于复杂了。
SELECT x FROM
(
SELECT
arrayMap(i -> pow(2,i), arrayCumSum(groupArray(log2(x)))) z
FROM ( SELECT arrayJoin([2, 3, 4, 5]) AS x )
)
ARRAY JOIN z as x
我只是通过@Denis Zhuravlev 扩展答案。
CH 没有专用的 function 来计算累积乘法(或除加法之外的任何任意数学运算符),此外,由于这种计算的“累积”性质,无法应用现有函数来获得所需的结果。
因此需要使用对数将乘法转换为加法:
记录一个x*y = 记录一个x + 记录一个y
x*y = a (log a x + log a y)
SELECT r.1.1 row, r.1.2 rate, r.2 value, round(r.2, 2) rounded_value
FROM (
SELECT
groupArray((row, rate, rate_log)) data,
arrayMap(log -> exp10(log), arrayCumSum(data_item -> data_item.3, data)) rate_cumulative_values,
arrayJoin(arrayZip(data, rate_cumulative_values)) r
FROM (
SELECT row, rate, log10(rate) AS rate_log
FROM (
/* emulate the origin dataset */
SELECT data.1 row, data.2 rate
FROM (SELECT arrayJoin([
(1, 2), (2, 3), (3, 4), (4, 5),
(5, 1), (6, 0), (7, -1)]) AS data))
ORDER BY row));
/*
┌─row─┬─rate─┬──────────────value─┬─rounded_value─┐
│ 1 │ 2 │ 2 │ 2 │
│ 2 │ 3 │ 6 │ 6 │
│ 3 │ 4 │ 23.999999999999993 │ 24 │
│ 4 │ 5 │ 119.99999999999996 │ 120 │
│ 5 │ 1 │ 119.99999999999996 │ 120 │
│ 6 │ 0 │ 0 │ 0 │
│ 7 │ -1 │ nan │ nan │
└─────┴──────┴────────────────────┴───────────────┘
*/
同样的逻辑可以应用于计算累积除法:
x/y = a (log a x - log a y)
累积划分:
SELECT r.1.1 row, r.1.2 rate, r.2 value, round(r.2, 2) rounded_value
FROM (
SELECT
groupArray((row, rate, rate_log)) data,
arrayMap(log -> exp10(log), arrayCumSum((data_item, index) -> index = 1 ? data_item.3 : - data_item.3, data, arrayEnumerate(data))) rate_cumulative_values,
arrayJoin(arrayZip(data, rate_cumulative_values)) r
FROM (
SELECT row, rate, log10(rate) AS rate_log
FROM (
/* emulate the origin dataset */
SELECT data.1 row, data.2 rate
FROM (SELECT arrayJoin([
(1, 100), (2, 2), (3, 10), (4, 2)]) AS data))
ORDER BY row));
/*
┌─row─┬─rate─┬──────────────value─┬─rounded_value─┐
│ 1 │ 100 │ 100 │ 100 │
│ 2 │ 2 │ 49.99999999999999 │ 50 │
│ 3 │ 10 │ 4.999999999999999 │ 5 │
│ 4 │ 2 │ 2.4999999999999996 │ 2.5 │
└─────┴──────┴────────────────────┴───────────────┘
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.