[英]Calculate average for each month for a given date range
I have the employees table where each employee has a related start_date , end_date and a salary . 我有employees表,其中每个员工都有一个相关的start_date , end_date和一个工资 。
NOTE: on the bottom you can find the SQL code to import the structure and data. 注意:在底部,您可以找到导入结构和数据的SQL代码。
+----+-------+------------+------------+---------+
| id | name | start_date | end_date | salary |
+----+-------+------------+------------+---------+
| 1 | Mark | 2017-05-01 | 2020-01-31 | 2000.00 |
| 2 | Tania | 2018-02-01 | 2019-08-31 | 5000.00 |
| 3 | Leo | 2018-02-01 | 2018-09-30 | 3000.00 |
| 4 | Elsa | 2018-12-01 | 2020-05-31 | 4000.00 |
+----+-------+------------+------------+---------+
For a given date range I want to extract the average of the salaries for each month within the given date range. 对于给定的日期范围,我想提取给定日期范围内每个月的工资平均值。
UPDATE: I would like to have the solution for MySQL 5.6 but it would be great to have also the solution for MySQL 8+ (just for personal knowledge). 更新:我想拥有MySQL 5.6的解决方案,但是拥有MySQL 8+的解决方案(仅用于个人知识)会很棒。
If the date range is 2018-08-01 - 2019-01-31 , the SQL statement should loop from August 2018 to January 2019 and it has to calculate the average salary for each month: 如果日期范围是2018-08-01 - 2019-01-31 ,则SQL语句应从2018年8月到2019年1月循环,并且必须计算每个月的平均工资:
Following you can see the expected result for the date range 2018-08-01 - 2019-01-31 您可以看到日期范围2018-08-01 - 2019-01-31的预期结果
+------+-------+------------+
| year | month | avg_salary |
+------+-------+------------+
| 2018 | 08 | 3333.33 |
| 2018 | 09 | 3333.33 |
| 2018 | 10 | 3500.00 |
| 2018 | 11 | 3500.00 |
| 2018 | 12 | 3666.67 |
| 2019 | 01 | 3666.67 |
+------+-------+------------+
NOTE: I solved this problem mixing MySQL with PHP code but for big date range it has to execute too many queries (one each month). 注意:我解决了将MySQL与PHP代码混合的问题,但对于大日期范围,它必须执行太多查询(每月一个)。 So I would like to have a solution using MySQL only . 所以我想只使用MySQL的解决方案。
CREATE TABLE `employees` (
`id` int(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`salary` decimal(10,2) DEFAULT NULL
);
INSERT INTO `employees` (`id`, `name`, `start_date`, `end_date`, `salary`) VALUES
(1, 'Mark', '2017-05-01', '2020-01-31', '2000.00'),
(2, 'Tania', '2018-02-01', '2019-08-31', '5000.00'),
(3, 'Leo', '2018-02-01', '2018-09-30', '3000.00'),
(4, 'Elsa', '2018-12-01', '2020-05-31', '4000.00');
Here's a MySQL 8.0 recursive CTE way of doing it. 这是一个MySQL 8.0递归CTE方式。 The CTE creates a list of all the year, month
combinations between the minimum start_date
and maximum end_date
in the employees
table, which is then LEFT JOIN
ed to the employees
table to get the average salary for all employees who were working in that particular year and month: CTE创建一个列表,其中列出了employees
表中最小start_date
和最大end_date
之间的所有year, month
组合,然后LEFT JOIN
employees
表,以获得在该特定年份工作的所有员工的平均工资。月:
WITH RECURSIVE months (year, month) AS
(
SELECT YEAR(MIN(start_date)) AS year, MONTH(MIN(start_date)) AS month FROM employees
UNION ALL
SELECT year + (month = 12), (month % 12) + 1 FROM months
WHERE STR_TO_DATE(CONCAT_WS('-', year, month, '01'), '%Y-%m-%d') <= (SELECT MAX(end_date) FROM employees)
)
SELECT m.year, m.month, ROUND(AVG(e.salary), 2) AS avg_salary
FROM months m
LEFT JOIN employees e ON STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN e.start_date AND e.end_date
WHERE STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN '2018-08-01' AND '2019-01-31'
GROUP BY m.year, m.month
Output: 输出:
year month avg_salary
2018 8 3333.33
2018 9 3333.33
2018 10 3500.00
2018 11 3500.00
2018 12 3666.67
2019 1 3666.67
You can simply type the desired months (or use PHP code to generate them) and join with it: 您只需键入所需的月份(或使用PHP代码生成它们)并加入它:
SELECT ym, AVG(salary)
FROM (
SELECT '2018-08-01' + INTERVAL 0 MONTH AS ym UNION ALL
SELECT '2018-08-01' + INTERVAL 1 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 2 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 3 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 4 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 5 MONTH
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
GROUP BY ym
If you have a table that contains numbers 0, 1, ... then you can use that. 如果你有一个包含数字0,1,...的表,那么你可以使用它。 You can even use any table that has sufficient number of rows: 您甚至可以使用任何具有足够行数的表:
SELECT ym, AVG(salary)
FROM (
SELECT '2018-08-01' + INTERVAL @n := @n + 1 MONTH AS ym
FROM anytable, (SELECT @n := -1) x
LIMIT 100
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
WHERE ym <= '2019-01-01'
GROUP BY ym
To get this done, you would need to generate a list of days from a date range. 要完成此操作,您需要生成日期范围内的天数列表。 This is a frequently asked question on SO, I used the accepted solution from this post . 这是关于SO的常见问题,我使用了这篇文章中接受的解决方案。 It uses a simple arithmetic method and can generate wide lists of dates (although performance may suffer). 它使用简单的算术方法,可以生成广泛的日期列表(虽然性能可能会受到影响)。
Then, we just need to JOIN with the original table to compute the average salary at that point of time. 然后,我们只需要使用原始表来JOIN来计算那个时间点的平均工资。
select
year(x.date),
month(x.date),
avg(coalesce(e.salary, 0)) avg_salary
from (
select a.date
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
where a.date between '2018-08-01' and '2019-01-31'
) x left join employees e ON x.date between e.start_date and e.end_date
group by year(x.date), month(x.date)
order by 1, 2
| year(x.date) | month(x.date) | avg_salary |
| ------------ | ------------- | ----------- |
| 2018 | 8 | 3333.333333 |
| 2018 | 9 | 3333.333333 |
| 2018 | 10 | 3500 |
| 2018 | 11 | 3500 |
| 2018 | 12 | 3666.666667 |
| 2019 | 1 | 3666.666667 |
PS : anoter approach would have been to create a calendar table, that stores the list of days, and then just : PS:anoter方法可能是创建一个日历表,存储日期列表,然后只需:
select
year(x.date),
month(x.date),
avg(coalesce(e.salary, 0)) avg_salary
from
mycalendar x
left join employees e ON x.date between e.start_date and e.end_date
where x.date between '2018-08-01' and '2019-01-31'
group by year(x.date), month(x.date)
order by 1, 2
A partial answer... 部分答案......
Here's an 'old school' solution, using a table of integers (0-9), but note that this kind of thing is redundant in newer versions of sql... 这是一个'老派'解决方案,使用整数表(0-9),但请注意,在较新版本的sql中,这种事情是多余的...
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH x
FROM ints i1
, ints i2
WHERE '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH BETWEEN '2018-08-01' AND '2019-01-31';
+------------+
| x |
+------------+
| 2018-08-01 |
| 2018-09-01 |
| 2018-10-01 |
| 2018-11-01 |
| 2018-12-01 |
| 2019-01-01 |
+------------+
The following is a Postgresql way of doing it. 以下是Postgresql的做法。 It can be converted to a Mysql query by changing the equivalent of generate_series()
link and Extract()
in Mysql 它可以通过在Mysql中更改generate_series()
链接和Extract()
的等效项来转换为Mysql查询
WITH cte1 AS
(SELECT generate_series('2018-08-01', '2019-01-31', '1 month'::interval)::date AS date),
cte2 AS
(SELECT id,
name,
salary,
generate_series(start_date, end_date, '1 month'::interval)::date AS date
FROM employees)
SELECT extract(YEAR
FROM cte1.date),
extract(MONTH
FROM cte1.date),
avg(salary)
FROM cte1
JOIN cte2 ON extract(MONTH
FROM cte1.date)=extract(MONTH
FROM cte2.date)
AND extract(YEAR
FROM cte1.date)=extract(YEAR
FROM cte2.date)
GROUP BY extract(YEAR
FROM cte1.date),
extract(MONTH
FROM cte1.date);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.