简体   繁体   English

计算给定日期范围的每个月的平均值

[英]Calculate average for each month for a given date range

I have the employees table where each employee has a related start_date , end_date and a salary . 我有employees表,其中每个员工都有一个相关的start_dateend_date和一个工资

NOTE: on the bottom you can find the SQL code to import the structure and data. 注意:在底部,您可以找到导入结构和数据的SQL代码。

+----+-------+------------+------------+---------+
| id | name  | start_date | end_date   | salary  |
+----+-------+------------+------------+---------+
|  1 | Mark  | 2017-05-01 | 2020-01-31 | 2000.00 |
|  2 | Tania | 2018-02-01 | 2019-08-31 | 5000.00 |
|  3 | Leo   | 2018-02-01 | 2018-09-30 | 3000.00 |
|  4 | Elsa  | 2018-12-01 | 2020-05-31 | 4000.00 |
+----+-------+------------+------------+---------+

The problem 问题

For a given date range I want to extract the average of the salaries for each month within the given date range. 对于给定的日期范围,我想提取给定日期范围内每个月的工资平均值。

UPDATE: I would like to have the solution for MySQL 5.6 but it would be great to have also the solution for MySQL 8+ (just for personal knowledge). 更新:我想拥有MySQL 5.6的解决方案,但是拥有MySQL 8+的解决方案(仅用于个人知识)会很棒。

Example

If the date range is 2018-08-01 - 2019-01-31 , the SQL statement should loop from August 2018 to January 2019 and it has to calculate the average salary for each month: 如果日期范围是2018-08-01 - 2019-01-31 ,则SQL语句应从2018年8月到2019年1月循环,并且必须计算每个月的平均工资:

  • in August 2018 the active employees are Mark , Tania , Leo (because August 2018 is between their start_date and end_date ) so the average is 3333.33 20188月 ,活跃的员工是MarkTaniaLeo (因为2018年8月是他们的start_dateend_date之间)所以平均值是3333.33
  • in September 2018 the active employees are Mark , Tania , Leo (because September 2018 is between their start_date and end_date ) so the average is 3333.33 20189月 ,活跃的员工是MarkTaniaLeo (因为2018年9月是他们的start_dateend_date之间)所以平均值是3333.33
  • in October 2018 the active employees are Mark , Tania so the average is 3500.00 201810月 ,活跃的员工是马克塔尼亚所以平均是3500.00
  • in November 2018 the active employees are Mark , Tania so the average is 3500.00 201811月 ,活跃的员工是马克塔尼亚所以平均是3500.00
  • in December 2018 the active employees are Mark , Tania , Elsa so the average is 3666.6667 201812月 ,活跃的员工是MarkTaniaElsa所以平均是3666.6667
  • in January 2019 the active employees are Mark , Tania , Elsa so the average is 3666.6667 2019年1月 ,活跃的员工是MarkTaniaElsa所以平均是3666.6667

Following you can see the expected result for the date range 2018-08-01 - 2019-01-31 您可以看到日期范围2018-08-01 - 2019-01-31的预期结果

+------+-------+------------+
| year | month | avg_salary |
+------+-------+------------+
| 2018 | 08    | 3333.33    |
| 2018 | 09    | 3333.33    |
| 2018 | 10    | 3500.00    |
| 2018 | 11    | 3500.00    |
| 2018 | 12    | 3666.67    |
| 2019 | 01    | 3666.67    |
+------+-------+------------+

NOTE: I solved this problem mixing MySQL with PHP code but for big date range it has to execute too many queries (one each month). 注意:我解决了将MySQL与PHP代码混合的问题,但对于大日期范围,它必须执行太多查询(每月一个)。 So I would like to have a solution using MySQL only . 所以我想只使用MySQL的解决方案。

SQL to import structure and data SQL导入结构和数据

CREATE TABLE `employees` (
  `id` int(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `start_date` date NOT NULL,
  `end_date` date NOT NULL,
  `salary` decimal(10,2) DEFAULT NULL
);

INSERT INTO `employees` (`id`, `name`, `start_date`, `end_date`, `salary`) VALUES
(1, 'Mark', '2017-05-01', '2020-01-31', '2000.00'),
(2, 'Tania', '2018-02-01', '2019-08-31', '5000.00'),
(3, 'Leo', '2018-02-01', '2018-09-30', '3000.00'),
(4, 'Elsa', '2018-12-01', '2020-05-31', '4000.00');

Here's a MySQL 8.0 recursive CTE way of doing it. 这是一个MySQL 8.0递归CTE方式。 The CTE creates a list of all the year, month combinations between the minimum start_date and maximum end_date in the employees table, which is then LEFT JOIN ed to the employees table to get the average salary for all employees who were working in that particular year and month: CTE创建一个列表,其中列出了employees表中最小start_date和最大end_date之间的所有year, month组合,然后LEFT JOIN employees表,以获得在该特定年份工作的所有员工的平均工资。月:

WITH RECURSIVE months (year, month) AS
(
  SELECT YEAR(MIN(start_date)) AS year, MONTH(MIN(start_date)) AS month FROM employees
  UNION ALL
  SELECT year + (month = 12), (month % 12) + 1 FROM months
  WHERE STR_TO_DATE(CONCAT_WS('-', year, month, '01'), '%Y-%m-%d') <= (SELECT MAX(end_date) FROM employees)
)
SELECT m.year, m.month, ROUND(AVG(e.salary), 2) AS avg_salary
FROM months m
LEFT JOIN employees e ON STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN e.start_date AND e.end_date
WHERE STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN '2018-08-01' AND '2019-01-31'
GROUP BY m.year, m.month

Output: 输出:

year    month   avg_salary
2018    8       3333.33
2018    9       3333.33
2018    10      3500.00
2018    11      3500.00
2018    12      3666.67
2019    1       3666.67

Demo on dbfiddle 在dbfiddle上演示

You can simply type the desired months (or use PHP code to generate them) and join with it: 您只需键入所需的月份(或使用PHP代码生成它们)并加入它:

SELECT ym, AVG(salary)
FROM (
    SELECT '2018-08-01' + INTERVAL 0 MONTH AS ym UNION ALL
    SELECT '2018-08-01' + INTERVAL 1 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 2 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 3 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 4 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 5 MONTH
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
GROUP BY ym

If you have a table that contains numbers 0, 1, ... then you can use that. 如果你有一个包含数字0,1,...的表,那么你可以使用它。 You can even use any table that has sufficient number of rows: 您甚至可以使用任何具有足够行数的表:

SELECT ym, AVG(salary)
FROM (
    SELECT '2018-08-01' + INTERVAL @n := @n + 1 MONTH AS ym
    FROM anytable, (SELECT @n := -1) x
    LIMIT 100
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
WHERE ym <= '2019-01-01'
GROUP BY ym

To get this done, you would need to generate a list of days from a date range. 要完成此操作,您需要生成日期范围内的天数列表。 This is a frequently asked question on SO, I used the accepted solution from this post . 这是关于SO的常见问题,我使用了这篇文章中接受的解决方案。 It uses a simple arithmetic method and can generate wide lists of dates (although performance may suffer). 它使用简单的算术方法,可以生成广泛的日期列表(虽然性能可能会受到影响)。

Then, we just need to JOIN with the original table to compute the average salary at that point of time. 然后,我们只需要使用原始表来JOIN来计算那个时间点的平均工资。

select
  year(x.date), 
  month(x.date),
  avg(coalesce(e.salary, 0)) avg_salary
from (
  select a.date 
  from (
      select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date
      from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
  ) a
  where a.date between '2018-08-01' and '2019-01-31'
) x left join employees e ON x.date between e.start_date and e.end_date
group by year(x.date), month(x.date)
order by 1, 2

Demo on DB fiddle : DB小提琴演示

| year(x.date) | month(x.date) | avg_salary  |
| ------------ | ------------- | ----------- |
| 2018         | 8             | 3333.333333 |
| 2018         | 9             | 3333.333333 |
| 2018         | 10            | 3500        |
| 2018         | 11            | 3500        |
| 2018         | 12            | 3666.666667 |
| 2019         | 1             | 3666.666667 |

PS : anoter approach would have been to create a calendar table, that stores the list of days, and then just : PS:anoter方法可能是创建一个日历表,存储日期列表,然后只需:

select
  year(x.date), 
  month(x.date),
  avg(coalesce(e.salary, 0)) avg_salary
from 
  mycalendar x
  left join employees e ON x.date between e.start_date and e.end_date
where x.date between '2018-08-01' and '2019-01-31'
group by year(x.date), month(x.date)
order by 1, 2

A partial answer... 部分答案......

Here's an 'old school' solution, using a table of integers (0-9), but note that this kind of thing is redundant in newer versions of sql... 这是一个'老派'解决方案,使用整数表(0-9),但请注意,在较新版本的sql中,这种事情是多余的...

SELECT * FROM ints;
  +---+
  | i |
  +---+
  | 0 |
  | 1 |
  | 2 |
  | 3 |
  | 4 |
  | 5 |
  | 6 |
  | 7 |
  | 8 |
  | 9 |
  +---+

SELECT '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH x 
  FROM ints i1
     , ints i2 
 WHERE '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH BETWEEN '2018-08-01' AND '2019-01-31';

  +------------+
  | x          |
  +------------+
  | 2018-08-01 |
  | 2018-09-01 |
  | 2018-10-01 |
  | 2018-11-01 |
  | 2018-12-01 |
  | 2019-01-01 |
  +------------+

The following is a Postgresql way of doing it. 以下是Postgresql的做法。 It can be converted to a Mysql query by changing the equivalent of generate_series() link and Extract() in Mysql 它可以通过在Mysql中更改generate_series() 链接Extract()的等效项来转换为Mysql查询

WITH cte1 AS
  (SELECT generate_series('2018-08-01', '2019-01-31', '1 month'::interval)::date AS date),
     cte2 AS
  (SELECT id,
          name,
          salary,
          generate_series(start_date, end_date, '1 month'::interval)::date AS date
   FROM employees)
SELECT extract(YEAR
               FROM cte1.date),
       extract(MONTH
               FROM cte1.date),
       avg(salary)
FROM cte1
JOIN cte2 ON extract(MONTH
                     FROM cte1.date)=extract(MONTH
                                             FROM cte2.date)
AND extract(YEAR
            FROM cte1.date)=extract(YEAR
                                    FROM cte2.date)
GROUP BY extract(YEAR
                 FROM cte1.date),
         extract(MONTH
                 FROM cte1.date);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM