根據另一列計數不同

Question

我有下表：

CREATE TABLE tbl (
  id          int NOT NULL
, date        date NOT NULL
, cid         int NOT NULL
, birth_place text NOT NULL
, location    text NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1, 'France' , 'Germany')
, (2 , '2022-01-30', 1, 'France' , 'France')
, (3 , '2022-01-25', 2, 'Spain'  , 'Spain')
, (4 , '2022-01-12', 3, 'France' , 'France')
, (5 , '2022-02-01', 4, 'England', 'Italy')
, (6 , '2022-02-12', 1, 'France' , 'France')
, (7 , '2022-03-05', 5, 'Spain'  , 'England')
, (8 , '2022-03-08', 2, 'Spain'  , 'Spain')
, (9 , '2022-03-15', 2, 'Spain'  , 'Spain')
, (10, '2022-03-30', 5, 'Spain'  , 'Italy')
, (11, '2022-03-22', 4, 'England', 'England')
, (12, '2022-03-22', 3, 'France' , 'England');

我需要計算每個月和位置的不同客戶（= cid ） - 有一個特殊的轉折：
如果客戶在任何給定月份返回出生地 ( location = birth_place )，則優先考慮該位置。 否則每月選擇一個地點和客戶任意選擇。

我想要的輸出：

date         location   count
2022-01-01   France     2
2022-01-01   Spain      1
2022-02-01   Italy      1
2022-02-01   France     1
2022-03-01   Spain      1
2022-03-01   England    3

2022-01-01 中的cid 1 有一個location = birth_place ，並且在該時間段內沒有其他客戶將德國作為位置，因此在我想要的輸出位置中沒有德國。

這是我當前的查詢：

with
  t as (
    select id, date_trunc('month', date)::date AS date, cid, birth_place, location
    from tbl),
  t1 as (
    select date, cid, location
    from t
    where birth_place = location),
  t2 as (
    select date, cid, location, row_number() over (partition by date, cid order by date) as row
    from t
    where birth_place <> location),
  t3 as (
    select t.*, 
        case
            when t1.location is not null then t1.location
            else t2.location
        end as new_loc
    from t
    left join t1
    on t.cid = t1.cid and t.date = t1.date
    left join t2
    on t.cid = t2.cid and t.date = t2.date and t2.row = 1)
select date, new_loc, count(distinct cid)
from t3
group by 1, 2
order by 1, 2

它有效，但對於 1 億行似乎效率低下。
我正在尋找一種更有效的方法。

Answer 1

假設這個目標：

將日期截斷為月份。
每個（月，cid）選擇一個位置，家庭位置優先。
然后計算每個（月，位置）的行數。

SELECT date, location, count(*)
FROM  (
   SELECT DISTINCT ON (1, 2)  --  choose **one** location per (month, cid)
          date_trunc('month', date)::date AS date, cid, location
   FROM   tbl
   ORDER  BY 1, 2, birth_place = location DESC  -- priority to home location, else **arbitrary**
   ) sub
GROUP  BY 1, 2
ORDER  BY 1, 2;  -- optional

db<> 在這里擺弄

請注意，在“不在家”的情況下任意選擇會產生不同的結果！ 您可能想要定義一個穩定的（合適的）選擇。

根據未公開的詳細信息，可能會有更快的查詢變體。

關於DISTINCT ON和性能：

在每個 GROUP BY 組中選擇第一行？

關於排序順序：

SQL選擇按日和月查詢順序

如果可以有 NULL 值：

按 ASC 列排序，但首先是 NULL 值？

Answer 2

WITH q1 AS (

    SELECT
        EXTRACT( YEAR  FROM t."date" ) AS "Year",
        EXTRACT( MONTH FROM t."date" ) AS "Month",

        t.cid,
        t.birth_place,
        t.location
    FROM
        theTable AS t
    WHERE
        t.location = t.birth_place
)
SELECT
    "Year",
    "Month",
    "location",
    COUNT( DISTINCT cId ) AS "COUNT( DISTINCT cId )",
    COUNT( * ) AS "CountAll"
FROM
    q1
GROUP BY
    "Year",
    "Month",
    "location"
ORDER BY
    "Year",
    "Month",
    "location"

SqlFiddle 示例。

Answer 3

我的方法是使用case...when在計數范圍內。 這樣，無論是否使用where過濾器，它都可以工作，因此將來允許在同一查詢中進行其他數據聚合。

SELECT
    date_trunc('month', date)::date AS date, t.location
    , count(distinct (case when t.location=t.birth_place then t.cid else null end)) as "count"
FROM theTable AS t
WHERE t.location=t.birth_place
GROUP BY date_trunc('month', date)::date, t.location

根據另一列計數不同

問題描述

3 個解決方案

解決方案1
2 已采納 2022-06-18 02:43:40

解決方案2
1 2022-06-18 02:29:10

解決方案3
0 2022-06-18 05:59:21

根據另一列計數不同

問題描述

3 個解決方案

解決方案1 2 已采納 2022-06-18 02:43:40

解決方案2 1 2022-06-18 02:29:10

解決方案3 0 2022-06-18 05:59:21

解決方案1
2 已采納 2022-06-18 02:43:40

解決方案2
1 2022-06-18 02:29:10

解決方案3
0 2022-06-18 05:59:21