[英]Count distinct based on another column
我有下表:
CREATE TABLE tbl (
id int NOT NULL
, date date NOT NULL
, cid int NOT NULL
, birth_place text NOT NULL
, location text NOT NULL
);
INSERT INTO tbl VALUES
(1 , '2022-01-01', 1, 'France' , 'Germany')
, (2 , '2022-01-30', 1, 'France' , 'France')
, (3 , '2022-01-25', 2, 'Spain' , 'Spain')
, (4 , '2022-01-12', 3, 'France' , 'France')
, (5 , '2022-02-01', 4, 'England', 'Italy')
, (6 , '2022-02-12', 1, 'France' , 'France')
, (7 , '2022-03-05', 5, 'Spain' , 'England')
, (8 , '2022-03-08', 2, 'Spain' , 'Spain')
, (9 , '2022-03-15', 2, 'Spain' , 'Spain')
, (10, '2022-03-30', 5, 'Spain' , 'Italy')
, (11, '2022-03-22', 4, 'England', 'England')
, (12, '2022-03-22', 3, 'France' , 'England');
我需要計算每個月和位置的不同客戶(= cid
) - 有一個特殊的轉折:
如果客戶在任何給定月份返回出生地 ( location = birth_place
),則優先考慮該位置。 否則每月選擇一個地點和客戶任意選擇。
我想要的輸出:
date location count
2022-01-01 France 2
2022-01-01 Spain 1
2022-02-01 Italy 1
2022-02-01 France 1
2022-03-01 Spain 1
2022-03-01 England 3
2022-01-01 中的cid
1 有一個location = birth_place
,並且在該時間段內沒有其他客戶將德國作為位置,因此在我想要的輸出位置中沒有德國。
這是我當前的查詢:
with
t as (
select id, date_trunc('month', date)::date AS date, cid, birth_place, location
from tbl),
t1 as (
select date, cid, location
from t
where birth_place = location),
t2 as (
select date, cid, location, row_number() over (partition by date, cid order by date) as row
from t
where birth_place <> location),
t3 as (
select t.*,
case
when t1.location is not null then t1.location
else t2.location
end as new_loc
from t
left join t1
on t.cid = t1.cid and t.date = t1.date
left join t2
on t.cid = t2.cid and t.date = t2.date and t2.row = 1)
select date, new_loc, count(distinct cid)
from t3
group by 1, 2
order by 1, 2
它有效,但對於 1 億行似乎效率低下。
我正在尋找一種更有效的方法。
假設這個目標:
將日期截斷為月份。
每個(月,cid)選擇一個位置,家庭位置優先。
然后計算每個(月,位置)的行數。
SELECT date, location, count(*)
FROM (
SELECT DISTINCT ON (1, 2) -- choose **one** location per (month, cid)
date_trunc('month', date)::date AS date, cid, location
FROM tbl
ORDER BY 1, 2, birth_place = location DESC -- priority to home location, else **arbitrary**
) sub
GROUP BY 1, 2
ORDER BY 1, 2; -- optional
db<> 在這里擺弄
請注意,在“不在家”的情況下任意選擇會產生不同的結果! 您可能想要定義一個穩定的(合適的)選擇。
根據未公開的詳細信息,可能會有更快的查詢變體。
關於DISTINCT ON
和性能:
關於排序順序:
如果可以有 NULL 值:
WITH q1 AS (
SELECT
EXTRACT( YEAR FROM t."date" ) AS "Year",
EXTRACT( MONTH FROM t."date" ) AS "Month",
t.cid,
t.birth_place,
t.location
FROM
theTable AS t
WHERE
t.location = t.birth_place
)
SELECT
"Year",
"Month",
"location",
COUNT( DISTINCT cId ) AS "COUNT( DISTINCT cId )",
COUNT( * ) AS "CountAll"
FROM
q1
GROUP BY
"Year",
"Month",
"location"
ORDER BY
"Year",
"Month",
"location"
我的方法是使用case...when
在計數范圍內。 這樣,無論是否使用where
過濾器,它都可以工作,因此將來允許在同一查詢中進行其他數據聚合。
SELECT
date_trunc('month', date)::date AS date, t.location
, count(distinct (case when t.location=t.birth_place then t.cid else null end)) as "count"
FROM theTable AS t
WHERE t.location=t.birth_place
GROUP BY date_trunc('month', date)::date, t.location
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.