如何订购 sql 中不同字符串的计数

Question

So I have a table which looks like this:所以我有一个看起来像这样的表：

ID           timestamp                  USER_TYPE
osdfbouwefo  2021-03-04 15:58:02.271    unidentified
ieqiofbeoww  2021-03-04 19:58:53.125    GroupA
fbruwbfewef  2021-03-04 20:59:02.273    GroupA
oewfbewfuff  2021-03-05 04:34:12.512    GroupB
hmithneregn  2021-03-05 15:43:22.271    GroupA
v_eifb3r39f  2021-03-06 03:58:54.231    unidentified
ieqiofbeoww  2021-03-06 12:21:34.211    GroupA
fbruwbfewef  2021-03-06 18:56:44.121    GroupA

What I need to create is a running tally of unique users by USER_TYPE.我需要创建的是 USER_TYPE 的唯一用户的运行记录。 That is, the first time a user appears on the table the are counted for that day, and never counted again.也就是说，用户第一次出现在桌子上时，将被计算为当天，并且不再计算在内。 Additionally it needs to be broken down by USER_TYPE and what I expect is a grouped table by DATE accounting for each new user ID, increasing in count as the dates increase for each USER_TYPE.此外，它需要按USER_TYPE细分，我期望的是按 DATE 为每个新用户 ID 记帐的分组表，随着每个 USER_TYPE 的日期增加而增加计数。

Final outcome:最终结果：

DATE           USER_TYPE USE_COUNT
2021-03-04  unidentified         1
2021-03-04        GroupA         2
2021-03-05        GroupB         1
2021-03-05        GroupA         3
2021-03-06  unidentified         2

So if you look at just one USER_TYPE, it increases based on the last count.因此，如果您只查看一个 USER_TYPE，它会根据最后一次计数而增加。 There is one unidentified in 2021-03-04 so it is represented as 1 in USE_COUNT. 2021-03-04 中有一个unidentified ，因此在2021-03-04中表示为 1。 The next time unidentified appears with a distinct ID is on 2021-03-06 making it the second time it has appeared, thus this is 2. Same goes for all USER_TYPE's, they are always adding onto themselves.下一次以不同 ID 出现的unidentified是在2021-03-06使其第二次出现，因此这是 2。所有 USER_TYPE 也是如此，它们总是添加到自己身上。

Notice the final two entries in the original table are not included because those ID's already occurred请注意，原始表中的最后两个条目不包括在内，因为这些 ID 已经出现

Here's what I tried but this isn't exactly it, I hope this is possible!!!这是我尝试过的，但这不完全是，我希望这是可能的！！！

    SELECT
        DISTINCT DATE(TIMESTAMP) AS "DATE",
        USER_TYPE,
        COUNT(ID) OVER (
            PARTITION BY USER_TYPE
            ORDER BY
                DATE(TIMESTAMP) ASC
        ) AS USE_COUNT
    FROM
        table
    ORDER BY
        DATE(TIMESTAMP) ASC

Answer 1

I think this will work:我认为这会起作用：

SELECT DATE(timestamp) DATE,USER_TYPE, 
       ROW_NUMBER() OVER (PARTITION BY USER_TYPE ORDER BY timestamp) USE_COUNT FROM
 (SELECT ID,timestamp,user_type, 
         ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp) rnum
 FROM mytable) A
 WHERE rnum=1
 ORDER BY DATE(timestamp) ASC, USER_TYPE DESC;

The idea is first to assign ROW_NUMBER() partitioned by ID and order by timestamp .这个想法是首先分配按ID分区的ROW_NUMBER()并按timestamp排序。 Then turn it into a sub-query.然后把它变成一个子查询。 In the outer query, do another ROW_NUMBER() but this time partition it by USER_TYPE with same ordering as in the sub-query.在外部查询中，执行另一个ROW_NUMBER()但这次按USER_TYPE进行分区，其排序与子查询中的相同。 Based on your sample data, the result should return like this:根据您的示例数据，结果应返回如下：

DATE日期	USER_TYPE用户类型	USE_COUNT USE_COUNT
2021-03-04 2021-03-04	unidentified身份不明	1 1
2021-03-04 2021-03-04	GroupA A组	1 1
2021-03-04 2021-03-04	GroupA A组	2 2
2021-03-05 2021-03-05	GroupB B组	1 1
2021-03-05 2021-03-05	GroupA A组	3 3
2021-03-06 2021-03-06	unidentified身份不明	2 2

And here's a fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3fe5824ea1010d33777a005041d31bda这是一个小提琴： https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3fe5824ea1010d33777a005041d31bda

Answer 2

The accepted answer isn't even close to your expected result set.接受的答案甚至不接近您的预期结果集。

You need to find the earliest date per user, either您需要找到每个用户的最早日期，要么

FROM
 (
   SELECT
      ID
     ,timestamp
     ,user_type
     ,MIN(timestamp)
      OVER (PARTITION BY id) AS min_ts
   FROM mytable
 ) AS dt
WHERE timestamp = min_ts

or或者

FROM
 (
   SELECT
      ID
     ,timestamp
     ,user_type
     ,ROW_NUMBER() -- min timestamp gets lowest rownum 1
      OVER (PARTITION BY id
            ORDER BY timestamp) AS rn
   FROM mytable
 ) AS dt
WHERE rn=1

Then you count the unique users per day and run a cumulative sum:然后计算每天的唯一用户数并计算累积总和：

SELECT
   CAST(timestamp AS DATE) AS DATE
  ,USER_TYPE
  ,SUM(COUNT(*)) -- cumulative sum over count
       OVER (PARTITION BY USER_TYPE
             ORDER BY CAST(timestamp AS DATE)) AS USE_COUNT 
FROM
 (
   SELECT
      ID
     ,timestamp
     ,user_type
     ,ROW_NUMBER()
      OVER (PARTITION BY id
            ORDER BY timestamp) AS rn
   FROM mytable
 ) AS dt
WHERE rn=1
GROUP BY CAST(timestamp AS DATE), USER_TYPE
ORDER BY DATE, USER_TYPE
;

See fiddle见小提琴

Answer 3

Hmmm.嗯。 . . . . I think you want a cumulative sum window function with aggregation:我想你想要一个累积和 window function 聚合：

select date(timestamp), user_type,
       sum(count(*)) over (partition by user_type order by date(timestamp)) as running_count
from t
group by date(timestamp), user_type;

如何订购 sql 中不同字符串的计数

问题描述

3 个解决方案

解决方案1
2 2021-03-23 01:42:33

解决方案2
2 已采纳 2021-03-23 08:58:57

解决方案3
1 2021-03-22 21:07:30

如何订购 sql 中不同字符串的计数

问题描述

3 个解决方案

解决方案1 2 2021-03-23 01:42:33

解决方案2 2 已采纳 2021-03-23 08:58:57

解决方案3 1 2021-03-22 21:07:30

解决方案1
2 2021-03-23 01:42:33

解决方案2
2 已采纳 2021-03-23 08:58:57

解决方案3
1 2021-03-22 21:07:30