簡體   English   中英

在帶有/不帶時區的日期或時間戳的查詢中處理 generate_series()

[英]Handling of generate_series() in queries with date or timestamp with / without time zone

我有一個查詢來生成基於按dateemployee_id分組的日期系列的報告。 日期應基於特定時區,在本例中為“Asia/Kuala_Lumpur”。 但這可能會根據用戶所在時區的位置而改變。


SELECT 
     d::date AT TIME ZONE 'Asia/Kuala_Lumpur' AS created_date,  
     e.id,  
     e.name,
     e.division_id,
     ARRAY_AGG(
       a.id
     ) as rows,        
     MIN(a.created_at) FILTER (WHERE a.activity_type = 1) as min_time_in,
     MAX(a.created_at) FILTER (WHERE a.activity_type = 2) as max_time_out,
     ARRAY_AGG(
       CASE
           WHEN a.activity_type = 1
           THEN a.created_at
           ELSE NULL
       END
     ) as check_ins,
     ARRAY_AGG(
       CASE
           WHEN a.activity_type = 2
           THEN a.created_at
           ELSE NULL
       END
     ) as check_outs        
FROM    (SELECT MIN(created_at), MAX(created_at) FROM attendance) AS r(startdate,enddate)
  , generate_series(
        startdate::timestamp, 
        enddate::timestamp, 
        interval '1 day') g(d)
    CROSS JOIN  employee e
    LEFT JOIN   attendance a ON a.created_at::date = d::date AND e.id = a.employee_id
    where d::date = date '2020-11-20' and division_id = 1
GROUP BY 
    created_date
  , e.id
  , e.name
  , e.division_id
ORDER BY 
    created_date
  , e.id;

餐桌attendance的定義和樣本數據:

CREATE TABLE attendance (
    id int,
    employee_id int,
    activity_type int,
    created_at timestamp with time zone NOT NULL
);

INSERT INTO attendance VALUES
( 1, 1, 1,'2020-11-18 07:10:25 +00:00'),
( 2, 2, 1,'2020-11-18 07:30:25 +00:00'),
( 3, 3, 1,'2020-11-18 07:50:25 +00:00'),
( 4, 2, 2,'2020-11-18 19:10:25 +00:00'),
( 5, 3, 2,'2020-11-18 19:22:38 +00:00'),
( 6, 1, 2,'2020-11-18 20:01:05 +00:00'),
( 7, 1, 1,'2020-11-19 07:11:23 +00:00'),
( 8, 1, 2,'2020-11-19 16:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
( 9, 1, 1,'2020-11-19 19:11:23 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_ins field in the results output)
(10, 1, 2,'2020-11-19 20:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
(11, 1, 1,'2020-11-20 07:41:38 +00:00'),
(12, 1, 2,'2020-11-20 08:52:01 +00:00');

這是一個要測試的小提琴

該查詢不包括時區 Asia/Kuala_Lumpur +8 的 output 中的第 8-10 行,盡管它應該。 結果顯示“行”字段11,12

如何修復查詢,以便它根據給定時區的日期生成報告? (意味着我可以將Asia/Kuala_Lumpur更改為America/New_York等)

我被告知要做這樣的事情:

where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and   created_at <  timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'

但我不確定如何應用它。 這個小提琴中似乎無法正常工作。 它應該包括第 8、9、10、11、12 行,但只顯示第 8、9、10 行。

數據庫設計

考慮對您的設置進行一些修改:

CREATE TABLE employee (
  id           int PRIMARY KEY  -- !
, name         text             -- do NOT use char(n) !
, division_id  int
);

CREATE  TABLE attendance (
  id             int PRIMARY KEY  --!
, employee_id    int NOT NULL REFERENCES employee -- FK!
, activity_type  int
, created_at     timestamptz NOT NULL
);

定義 PK 可以更輕松地聚合行,因為 PK 覆蓋GROUP BY子句中的整行。 看:

我不會使用“名稱”作為列名。 它不是描述性的。 每隔一列可以命名為“名稱”。 考慮:

詢問

SELECT *
FROM  (        -- complete employee/date grid for division in range
   SELECT g.d::date AS the_date, id AS employee_id, name, division_id
   FROM  (
      SELECT generate_series(MIN(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
                           , MAX(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
                           , interval '1 day')
      FROM   attendance
      ) g(d)
   CROSS  JOIN employee e
   WHERE  e.division_id = 1
   ) de
LEFT   JOIN (  -- checkins & checkouts per employee/date for division in range
   SELECT employee_id, ts::date AS the_date
        , array_agg(id) as rows
        , min(ts)             FILTER (WHERE activity_type = 1) AS min_check_in
        , max(ts)             FILTER (WHERE activity_type = 2) AS max_check_out
        , array_agg(ts::time) FILTER (WHERE activity_type = 1) AS check_ins
        , array_agg(ts::time) FILTER (WHERE activity_type = 2) AS check_outs
   FROM  (
      SELECT a.id, a.employee_id, a.activity_type, a.created_at AT TIME ZONE 'Asia/Kuala_Lumpur' AS ts  -- convert to timestamp
      FROM   employee   e
      JOIN   attendance a ON a.employee_id = e.id
   -- WHERE  a.created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' -- "sargable" expressions
   -- AND    a.created_at <  timestamp '2020-11-21' AT TIME ZONE 'Asia/Kuala_Lumpur' -- exclusive upper bound (includes all of 2020-11-20);
      AND    e.division_id = 1
      ORDER  BY a.employee_id, a.created_at, a.activity_type  -- optional to guarantee sorted arrays
   ) sub
   GROUP  BY 1, 2
   ) a USING (the_date, employee_id)
ORDER  BY 1, 2;

db<> 在這里擺弄

請注意,我的查詢輸出 Asia/Kuala_Lumpur 的本地日期和時間:

test=> SELECT timestamptz '2020-11-20 08:52:01 +0' AT TIME ZONE 'Asia/Kuala_Lumpur' AS local_ts;
      local_ts       
---------------------
 2020-11-20 16:52:01

從哪兒開始? 需要了解時區的概念和 Postgres 數據類型timestamp with time zone ( timestamptz ) 與timestamp without time zone ( timestamp )。 否則,將是無休止的混亂。 從這里開始:

最值得注意的是, timestamptz存儲時區:

當簡單地將timestamptz轉換為datetimestamp ,假設 session 的當前時區設置。 不是你想要的。 使用AT TIME ZONE構造顯式提供時區以避免這種問題。 在你的小提琴中,你有兩個:

  ...
  , generate_series(
        startdate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', 
        enddate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', 
        interval '1 day') g(d)
   ...

沒有做你想做的事。 在(錯誤!)轉換為timestamp之后, AT TIME ZONE構造將值轉換回timestamptz

此外,您的查詢會生成所有用戶的完整笛卡爾積以及表attendance的最大天數,只是為了將其減少回一天:

    where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
    and   created_at <  timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'

WHERE子句最終完成了它應該做的事情。 但是首先生成完整的天數,然后丟棄大部分天數是沒有意義的。 (似乎你同時從我的另一個小提琴中復制了它?)

我注釋掉了WHERE子句,並在我的查詢中保留了您的generate_series()的優化版本作為概念證明。 進一步閱讀:

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM