简体   繁体   English

在 MySQL 8 中使用 window 函数获取不同列的计数

[英]Getting count of distinct column with window functions in MySQL 8

I have an MVP DB fiddle: https://www.db-fiddle.com/f/cUn1Lo2xhbTAUwwV5q9wKV/2我有一个 MVP DB 小提琴: https://www.db-fiddle.com/f/cUn1Lo2xhbTAUwwV5q9wKV/2

I am trying to get the number of unique shift_id s in the table on any date using window functions.我正在尝试使用 window 函数在任何日期获取表中唯一shift_id的数量。

I tried to use COUNT(DISTINCT(shift_id)) but that is not supported on MySQL 8 with window functions at the moment.我尝试使用COUNT(DISTINCT(shift_id))但目前 MySQL 8 不支持 window 函数。

Just in case the fiddle goes down.以防万一小提琴掉线。 Here is the test schema:这是测试架构:

CREATE TABLE `scores` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `shift_id` int unsigned NOT NULL,
  `employee_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `score` double(8,2) unsigned NOT NULL,
  `created_at` timestamp NOT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO scores(shift_id, employee_name, score, created_at) VALUES
(1, "John", 6.72, "2020-04-01 00:00:00"),
(1, "Bob", 15.71, "2020-04-01 00:00:00"),
(1, "Bob", 54.02, "2020-04-01 00:00:00"),
(1, "John", 23.55, "2020-04-01 00:00:00"),

(2, "John", 9.13, "2020-04-02 00:00:00"),
(2, "Bob", 44.76, "2020-04-02 00:00:00"),
(2, "Bob", 33.40, "2020-04-02 00:00:00"),
(2, "James", 20, "2020-04-02 00:00:00"),

(3, "John", 20, "2020-04-02 00:00:00"),
(3, "Bob", 20, "2020-04-02 08:00:00"),
(3, "Bob", 30, "2020-04-02 08:00:00"),
(3, "James", 10, "2020-04-02 08:00:00")

And my query which has two attempted methods using what I saw on this post: Count distinct in window functions我的查询有两种尝试方法,使用我在这篇文章中看到的内容: Count distinct in window functions

SELECT
    ANY_VALUE(employee_name) AS `employee_name`,
    DATE(created_at) AS `shift_date`,
    COUNT(*) OVER (PARTITION BY ANY_VALUE(created_at), ANY_VALUE(shift_id)) AS `shifts_on_day_1`,

    (
        dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) asc) +
        dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) desc) - 1
    ) as `shifts_on_day_2`

FROM scores
    GROUP BY employee_name, DATE(created_at);

The expected result would be any row with the date of 2020-04-01 would have a shifts_on_day of 1 and the rows with the date of 2nd April would have shifts_on_day at 2.预期结果将是日期为 2020-04-01 的任何行的shifts_on_day为 1,而日期为 4 月 2 日的行的shifts_on_day为 2。

I have considered using a correlated subquery but that is a performance nightmare with millions of rows in the table and thousands being returned in the query.我考虑过使用相关子查询,但这是一个性能噩梦,表中有数百万行,查询中返回数千行。

Update: I think the necessity for window functions is that there is already a group by in the query.更新:我认为 window 函数的必要性是查询中已经有一个 group by。 All the data is needed in one query with the end goal bring to get the average_score of each employees on a specific day.一个查询中需要所有数据,最终目标是获取每个员工在特定日期的平均得分。 To get that total score for each employee I can just COUNT(*) .要获得每个员工的总分,我可以COUNT(*) But then I need to divide that by the total shifts in the day to get the average.但是我需要将其除以一天中的总班次以获得平均值。

Update更新

The end result is to be able to get the total number of rows per employee per date in the table divided by the total number of shits that occurred on that date - that will provide the average row count in that date per employee.最终结果是能够获得表中每个员工每个日期的总行数除以该日期发生的错误总数 - 这将提供该日期每个员工的平均行数。

Expected result is hence:因此,预期结果是:

name  | shift_date | avrg
------+------------+-----
Bob   | 2020-04-01 | 2     2 / 1 = 2 ; two rows for Bob, one shift_id (1) that day
Bob   | 2020-04-02 | 2     4 / 2 = 2 ; four rows for Bob, two shift_ids (2,3) that day
James | 2020-04-02 | 1     2 / 2 = 1 ; two rows for James, two shift_ids (2,3) that day
John  | 2020-04-01 | 2     2 / 1 = 2 ; two rows for John, one shift_id (1) that day
John  | 2020-04-02 | 1     2 / 2 = 1 ; two rows for John, two shift_ids (2,3) that day

"All rows per date and employee" and "distinct count of IDs per date" are two complete different aggregations; “每个日期和员工的所有行”和“每个日期的不同 ID 计数”是两个完全不同的聚合; you cannot do one aggregation and somehow retrieve the other aggregation from the elsewise aggregated rows.您不能进行一个聚合并以某种方式从 elsewise 聚合行中检索另一个聚合。 This rules window functions on the aggregation result out.这规则 window 函数对聚合结果输出。

You need two separate aggregations instead.您需要两个单独的聚合。 For instance:例如:

with empdays as
(
  select employee_name, date(created_at) as shift_date, count(*) as total
  from scores
  group by employee_name, date(created_at)
)
, days as 
(
  select date(created_at) as shift_date, count(distinct shift_id) as total
  from scores
  group by date(created_at)
)
select ed.employee_name, shift_date, ed.total / d.total as average
from empdays ed
join days d using (shift_date)
order by ed.employee_name, shift_date;

Demo: https://www.db-fiddle.com/f/qjqbibriXtos6Hsi5qcwi6/0演示: https://www.db-fiddle.com/f/qjqbibriXtos6Hsi5qcwi6/0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM