[英]How to order count of distinct strings in sql
So I have a table which looks like this:所以我有一个看起来像这样的表:
ID timestamp USER_TYPE
osdfbouwefo 2021-03-04 15:58:02.271 unidentified
ieqiofbeoww 2021-03-04 19:58:53.125 GroupA
fbruwbfewef 2021-03-04 20:59:02.273 GroupA
oewfbewfuff 2021-03-05 04:34:12.512 GroupB
hmithneregn 2021-03-05 15:43:22.271 GroupA
v_eifb3r39f 2021-03-06 03:58:54.231 unidentified
ieqiofbeoww 2021-03-06 12:21:34.211 GroupA
fbruwbfewef 2021-03-06 18:56:44.121 GroupA
What I need to create is a running tally of unique users by USER_TYPE.我需要创建的是 USER_TYPE 的唯一用户的运行记录。 That is, the first time a user appears on the table the are counted for that day, and never counted again.
也就是说,用户第一次出现在桌子上时,将被计算为当天,并且不再计算在内。 Additionally it needs to be broken down by
USER_TYPE
and what I expect is a grouped table by DATE accounting for each new user ID, increasing in count as the dates increase for each USER_TYPE.此外,它需要按
USER_TYPE
细分,我期望的是按 DATE 为每个新用户 ID 记帐的分组表,随着每个 USER_TYPE 的日期增加而增加计数。
Final outcome:最终结果:
DATE USER_TYPE USE_COUNT
2021-03-04 unidentified 1
2021-03-04 GroupA 2
2021-03-05 GroupB 1
2021-03-05 GroupA 3
2021-03-06 unidentified 2
So if you look at just one USER_TYPE, it increases based on the last count.因此,如果您只查看一个 USER_TYPE,它会根据最后一次计数而增加。 There is one
unidentified
in 2021-03-04
so it is represented as 1 in USE_COUNT. 2021-03-04 中有一个
unidentified
,因此在2021-03-04
中表示为 1。 The next time unidentified
appears with a distinct ID is on 2021-03-06
making it the second time it has appeared, thus this is 2. Same goes for all USER_TYPE's, they are always adding onto themselves.下一次以不同 ID 出现的
unidentified
是在2021-03-06
使其第二次出现,因此这是 2。所有 USER_TYPE 也是如此,它们总是添加到自己身上。
Notice the final two entries in the original table are not included because those ID's already occurred请注意,原始表中的最后两个条目不包括在内,因为这些 ID 已经出现
Here's what I tried but this isn't exactly it, I hope this is possible!!!这是我尝试过的,但这不完全是,我希望这是可能的!!!
SELECT
DISTINCT DATE(TIMESTAMP) AS "DATE",
USER_TYPE,
COUNT(ID) OVER (
PARTITION BY USER_TYPE
ORDER BY
DATE(TIMESTAMP) ASC
) AS USE_COUNT
FROM
table
ORDER BY
DATE(TIMESTAMP) ASC
I think this will work:我认为这会起作用:
SELECT DATE(timestamp) DATE,USER_TYPE,
ROW_NUMBER() OVER (PARTITION BY USER_TYPE ORDER BY timestamp) USE_COUNT FROM
(SELECT ID,timestamp,user_type,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp) rnum
FROM mytable) A
WHERE rnum=1
ORDER BY DATE(timestamp) ASC, USER_TYPE DESC;
The idea is first to assign ROW_NUMBER()
partitioned by ID
and order by timestamp
.这个想法是首先分配按
ID
分区的ROW_NUMBER()
并按timestamp
排序。 Then turn it into a sub-query.然后把它变成一个子查询。 In the outer query, do another
ROW_NUMBER()
but this time partition it by USER_TYPE
with same ordering as in the sub-query.在外部查询中,执行另一个
ROW_NUMBER()
但这次按USER_TYPE
进行分区,其排序与子查询中的相同。 Based on your sample data, the result should return like this:根据您的示例数据,结果应返回如下:
DATE![]() |
USER_TYPE![]() |
USE_COUNT ![]() |
---|---|---|
2021-03-04 ![]() |
unidentified![]() |
1 ![]() |
2021-03-04 ![]() |
GroupA ![]() |
1 ![]() |
2021-03-04 ![]() |
GroupA ![]() |
2 ![]() |
2021-03-05 ![]() |
GroupB ![]() |
1 ![]() |
2021-03-05 ![]() |
GroupA ![]() |
3 ![]() |
2021-03-06 ![]() |
unidentified![]() |
2 ![]() |
And here's a fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3fe5824ea1010d33777a005041d31bda这是一个小提琴: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3fe5824ea1010d33777a005041d31bda
The accepted answer isn't even close to your expected result set.接受的答案甚至不接近您的预期结果集。
You need to find the earliest date per user, either您需要找到每个用户的最早日期,要么
FROM
(
SELECT
ID
,timestamp
,user_type
,MIN(timestamp)
OVER (PARTITION BY id) AS min_ts
FROM mytable
) AS dt
WHERE timestamp = min_ts
or或者
FROM
(
SELECT
ID
,timestamp
,user_type
,ROW_NUMBER() -- min timestamp gets lowest rownum 1
OVER (PARTITION BY id
ORDER BY timestamp) AS rn
FROM mytable
) AS dt
WHERE rn=1
Then you count the unique users per day and run a cumulative sum:然后计算每天的唯一用户数并计算累积总和:
SELECT
CAST(timestamp AS DATE) AS DATE
,USER_TYPE
,SUM(COUNT(*)) -- cumulative sum over count
OVER (PARTITION BY USER_TYPE
ORDER BY CAST(timestamp AS DATE)) AS USE_COUNT
FROM
(
SELECT
ID
,timestamp
,user_type
,ROW_NUMBER()
OVER (PARTITION BY id
ORDER BY timestamp) AS rn
FROM mytable
) AS dt
WHERE rn=1
GROUP BY CAST(timestamp AS DATE), USER_TYPE
ORDER BY DATE, USER_TYPE
;
Hmmm.嗯。 .
. .
. I think you want a cumulative sum window function with aggregation:
我想你想要一个累积和 window function 聚合:
select date(timestamp), user_type,
sum(count(*)) over (partition by user_type order by date(timestamp)) as running_count
from t
group by date(timestamp), user_type;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.