簡體   English   中英

根據另一列計數不同

[英]Count distinct based on another column

我有下表:

CREATE TABLE tbl (
  id          int NOT NULL
, date        date NOT NULL
, cid         int NOT NULL
, birth_place text NOT NULL
, location    text NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1, 'France' , 'Germany')
, (2 , '2022-01-30', 1, 'France' , 'France')
, (3 , '2022-01-25', 2, 'Spain'  , 'Spain')
, (4 , '2022-01-12', 3, 'France' , 'France')
, (5 , '2022-02-01', 4, 'England', 'Italy')
, (6 , '2022-02-12', 1, 'France' , 'France')
, (7 , '2022-03-05', 5, 'Spain'  , 'England')
, (8 , '2022-03-08', 2, 'Spain'  , 'Spain')
, (9 , '2022-03-15', 2, 'Spain'  , 'Spain')
, (10, '2022-03-30', 5, 'Spain'  , 'Italy')
, (11, '2022-03-22', 4, 'England', 'England')
, (12, '2022-03-22', 3, 'France' , 'England');

我需要計算每個月和位置的不同客戶(= cid ) - 有一個特殊的轉折:
如果客戶在任何給定月份返回出生地 ( location = birth_place ),則優先考慮該位置。 否則每月選擇一個地點和客戶任意選擇。

我想要的輸出:

date         location   count
2022-01-01   France     2
2022-01-01   Spain      1
2022-02-01   Italy      1
2022-02-01   France     1
2022-03-01   Spain      1
2022-03-01   England    3

2022-01-01 中的cid 1 有一個location = birth_place ,並且在該時間段內沒有其他客戶將德國作為位置,因此在我想要的輸出位置中沒有德國。

這是我當前的查詢:

with
  t as (
    select id, date_trunc('month', date)::date AS date, cid, birth_place, location
    from tbl),
  t1 as (
    select date, cid, location
    from t
    where birth_place = location),
  t2 as (
    select date, cid, location, row_number() over (partition by date, cid order by date) as row
    from t
    where birth_place <> location),
  t3 as (
    select t.*, 
        case
            when t1.location is not null then t1.location
            else t2.location
        end as new_loc
    from t
    left join t1
    on t.cid = t1.cid and t.date = t1.date
    left join t2
    on t.cid = t2.cid and t.date = t2.date and t2.row = 1)
select date, new_loc, count(distinct cid)
from t3
group by 1, 2
order by 1, 2

它有效,但對於 1 億行似乎效率低下。
我正在尋找一種更有效的方法。

假設這個目標:

將日期截斷為月份。
每個(月,cid)選擇一個位置,家庭位置優先。
然后計算每個(月,位置)的行數。

SELECT date, location, count(*)
FROM  (
   SELECT DISTINCT ON (1, 2)  --  choose **one** location per (month, cid)
          date_trunc('month', date)::date AS date, cid, location
   FROM   tbl
   ORDER  BY 1, 2, birth_place = location DESC  -- priority to home location, else **arbitrary**
   ) sub
GROUP  BY 1, 2
ORDER  BY 1, 2;  -- optional

db<> 在這里擺弄

請注意,在“不在家”的情況下任意選擇會產生不同的結果! 您可能想要定義一個穩定的(合適的)選擇。

根據未公開的詳細信息,可能會有更快的查詢變體。

關於DISTINCT ON和性能:

關於排序順序:

如果可以有 NULL 值:

WITH q1 AS (

    SELECT
        EXTRACT( YEAR  FROM t."date" ) AS "Year",
        EXTRACT( MONTH FROM t."date" ) AS "Month",

        t.cid,
        t.birth_place,
        t.location
    FROM
        theTable AS t
    WHERE
        t.location = t.birth_place
)
SELECT
    "Year",
    "Month",
    "location",
    COUNT( DISTINCT cId ) AS "COUNT( DISTINCT cId )",
    COUNT( * ) AS "CountAll"
FROM
    q1
GROUP BY
    "Year",
    "Month",
    "location"
ORDER BY
    "Year",
    "Month",
    "location"

SqlFiddle 示例

在此處輸入圖像描述

我的方法是使用case...when在計數范圍內。 這樣,無論是否使用where過濾器,它都可以工作,因此將來允許在同一查詢中進行其他數據聚合。

SELECT
    date_trunc('month', date)::date AS date, t.location
    , count(distinct (case when t.location=t.birth_place then t.cid else null end)) as "count"
FROM theTable AS t
WHERE t.location=t.birth_place
GROUP BY date_trunc('month', date)::date, t.location

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM