简体   繁体   中英

Multiple arrays Clickhouse

Problem: Count distinct values in an array filtered by another array on same row (and agg higher).

Explanation: Using this data: 在此处输入图像描述 In the Size D70, there are 5 pcs available (hqsize), but shops requests 15. By using the column accumulatedNeed, the 5 first stores in the column shops should receive items (since every store request 1 pcs). That is [4098,4101,4109,4076,4080].

It could also be that the values in accumulatedNeed would be [1,4,5,5,5,...,15], where shop 1 request 1 pcs, shop2 3 pcs, etc. Then only 3 stores would get.

In the size E75 there is enough stock, so every shop will receive (10 shops):

在此处输入图像描述

Now i want the distinct list of shops from D70 & E75, which would be be final result: [4098,4101,4109,4076,4080,4062,4063,4067,4072,4075,4056,4058,4059,4061] ( 14 unique stores ) (4109 is only counted once)

Wanted result: [4098,4101,4109,4076,4080,4062,4063,4067,4072,4075,4056,4058,4059,4061]. ( 14 unique stores ) I'm totally open to structure the data otherwise if better. The reason why it can't be precalculated is that the result depends on which shops that are filtered on.

I use this query to create table with data similar to your screenshot:

CREATE TABLE t
(
    Size String,
    hqsize Int,
    accumulatedNeed Array(Int),
    shops Array(Int)
) engine = Memory;

INSERT INTO t VALUES ('D70', 5, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], [4098,4101,4109,4076,4080,4083,4062,4063,4067,4072,4075,4056,4057,4058,4059]),('E75', 43, [1,2,3,4,5,6,7,8,9,10], [4109,4062,4063,4067,4072,4075,4056,4058,4059,4061]);

Find which shops that can receive enough items:

SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask FROM t;
┌─mask────────────────────────────┐
│ [1,1,1,1,1,0,0,0,0,0,0,0,0,0,0] │
│ [1,1,1,1,1,1,1,1,1,1]           │
└─────────────────────────────────┘

Filter not fulfilled shops according to this mask: Note that shops and accumulatedNeed have to have equals sizes.

SELECT arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops, arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask FROM t;
┌─fulfilled_shops─────────────────────────────────────┬─mask────────────────────────────┐
│ [4098,4101,4109,4076,4080]                          │ [1,1,1,1,1,0,0,0,0,0,0,0,0,0,0] │
│ [4109,4062,4063,4067,4072,4075,4056,4058,4059,4061] │ [1,1,1,1,1,1,1,1,1,1]           │
└─────────────────────────────────────────────────────┴─────────────────────────────────┘

Then you can create table with all distinct shops:

SELECT DISTINCT arrayJoin(fulfilled_shops) as shops FROM (
    SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask, arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops FROM t
);
┌─shops─┐
│  4098 │
│  4101 │
│  4109 │
│  4076 │
│  4080 │
│  4062 │
│  4063 │
│  4067 │
│  4072 │
│  4075 │
│  4056 │
│  4058 │
│  4059 │
│  4061 │
└───────┘

14 rows in set. Elapsed: 0.049 sec.

Or if you need single array group it back:

SELECT groupArrayDistinct(arrayJoin(fulfilled_shops)) as shops FROM (
    SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask, arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops FROM t
);
┌─shops───────────────────────────────────────────────────────────────────┐
│ [4080,4076,4101,4075,4056,4061,4062,4063,4109,4058,4067,4059,4072,4098] │
└─────────────────────────────────────────────────────────────────────────┘

If you need data only from D70 & E75 you can filter extra rows from table with WHERE before.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM