[英]Query for time-series-like counters in psql
我在 psql 中有以下仅附加表:
CREATE TABLE IF NOT EXISTS data (
id UUID DEFAULT gen_random_uuid () PRIMARY KEY,
test_id UUID NOT NULL,
user_id UUID NOT NULL,
completed BOOL NOT NULL DEFAULT False,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT (NOW() AT TIME ZONE 'UTC'),
);
CREATE INDEX some_idx ON data (user_id, test_id, inserted_at DESC);
CREATE INDEX some_idx2 ON data (test_id, inserted_at DESC);
对于给定的test_id
,单个user_id
可能有多个条目,但只能completed
一个( completed
的条目也是最后一个)。
我正在查询给定的test_id
。 我需要的是过去一周中每一天的类似时间序列的数据。 对于每一天,我应该有以下内容:
inserted_at < "day"
inserted_at < "day"
最终, total
和completed
就像计数器一样,我只是想计算过去一周每一天的值。 例如:
| date | total | completed |
|------------|-------|-----------|
| 2022.01.19 | 100 | 50 |
| 2022.01.18 | 90 | 45 |
| ... | | |
什么是具有高效查询计划的查询? 我可以考虑添加新索引或修改现有索引。
PS:我这里有一个工作版本:
SELECT date, entered, completed
FROM (
SELECT d::date AS date
FROM generate_series('2023-01-12', now(),INTERVAL '1 day') AS d
) AS dates
cross join lateral (
SELECT COUNT(DISTINCT user_id) AS entered,
COUNT(1) FILTER (WHERE completed) AS completed // no need for distinct as completed is guaranteed to be once per user
FROM data
WHERE
test_id = 'someId' AND
inserted_at < dates.date
) AS vals
我不认为这是一个好的/高性能的解决方案,因为它会在每次横向连接迭代时重新扫描表。 这是查询计划:
+---------------------------------------------------------------------------------------------------------------------------->
| QUERY PLAN >
|---------------------------------------------------------------------------------------------------------------------------->
| Nested Loop (cost=185.18..185218.25 rows=1000 width=28) (actual time=0.928..7.687 rows=8 loops=1) >
| -> Function Scan on generate_series d (cost=0.01..10.01 rows=1000 width=8) (actual time=0.009..0.012 rows=8 loops=1) >
| -> Aggregate (cost=185.17..185.18 rows=1 width=16) (actual time=0.957..0.957 rows=1 loops=8) >
| -> Bitmap Heap Scan on data (cost=12.01..183.36 rows=363 width=38) (actual time=0.074..0.197 rows=779 loops>
| Recheck Cond: ((test_id = 'someId'::uuid) AND (inserted_at < (d.d)::date)) >
| Heap Blocks: exact=629 >
| -> Bitmap Index Scan on some_idx2 (cost=0.00..11.92 rows=363 width=0) (actual time=>
| Index Cond: ((test_id = 'someId'::uuid) AND (inserted_at < (d.d)::date>
| Planning Time: 0.261 ms >
| Execution Time: 7.733 ms >
+---------------------------------------------------------------------------------------------------------------------------->
我确定我在这里缺少一些方便的功能,这些功能会有所帮助。 感谢所有帮助:祈祷:
聊天后我们找到了解决方案
SELECT
date_trunc('day',inserted_at) AS adate,
COUNT(user_id) OVER (
ORDER BY inserted_at ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) -
SUM(CASE WHEN completed THEN 1 ELSE 0 END) OVER (
ORDER BY date_trunc('day', inserted_at) ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as user_cnt,
SUM(CASE WHEN completed THEN 1 ELSE 0 END) OVER (
ORDER BY date_trunc('day', inserted_at) ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS completed
FROM executions
WHERE journey_id = 'd2be0e01-19b1-403e-8659-ce6222f074fd'
ORDER BY date_trunc('day', inserted_at) ASC
您可以看到我们使用了相同的 SUM window function 两次。 一旦给出第二列的预期结果,所有 SQL 引擎都会对此进行优化以进行计算。 (完成的项目被计为一个额外的用户)。
下面的先前答案
好的,当我查看它时,您毕竟不需要 window function——只是 SUM() 和 GROUP BY 中 CASE 语句的技巧
SELECT COUNT(DISTINCT user_id) AS entered,
SUM(CASE WHEN completed THEN 1 ELSE 0 END) AS completed
FROM data
WHERE test_id = 'someId'
GROUP BY inserted_at
要获得给定日期的所有先验信息,如下所示:
SELECT date_trunc(day,inserted_at) AS date,
DENSE_RANK()
OVER (PARTITION BY user_id ORDER BY inserted_at ASC
BETWEEN ROWS UNBOUNDED PRECEDING AND CURRENT ROW) as user_cnt,
SUM(CASE WHEN completed THEN 1 ELSE 0 END)
OVER (ORDER BY inserted_at ASC
BETWEEN ROWS UNBOUNDED PRECEDING AND CURRENT ROW) AS completed
FROM data
WHERE test_id = 'someId'
ORDER BY inserted_at ASC
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.