简体   繁体   中英

Calculate average for each month for a given date range

I have the employees table where each employee has a related start_date , end_date and a salary .

NOTE: on the bottom you can find the SQL code to import the structure and data.

+----+-------+------------+------------+---------+
| id | name  | start_date | end_date   | salary  |
+----+-------+------------+------------+---------+
|  1 | Mark  | 2017-05-01 | 2020-01-31 | 2000.00 |
|  2 | Tania | 2018-02-01 | 2019-08-31 | 5000.00 |
|  3 | Leo   | 2018-02-01 | 2018-09-30 | 3000.00 |
|  4 | Elsa  | 2018-12-01 | 2020-05-31 | 4000.00 |
+----+-------+------------+------------+---------+

The problem

For a given date range I want to extract the average of the salaries for each month within the given date range.

UPDATE: I would like to have the solution for MySQL 5.6 but it would be great to have also the solution for MySQL 8+ (just for personal knowledge).

Example

If the date range is 2018-08-01 - 2019-01-31 , the SQL statement should loop from August 2018 to January 2019 and it has to calculate the average salary for each month:

  • in August 2018 the active employees are Mark , Tania , Leo (because August 2018 is between their start_date and end_date ) so the average is 3333.33
  • in September 2018 the active employees are Mark , Tania , Leo (because September 2018 is between their start_date and end_date ) so the average is 3333.33
  • in October 2018 the active employees are Mark , Tania so the average is 3500.00
  • in November 2018 the active employees are Mark , Tania so the average is 3500.00
  • in December 2018 the active employees are Mark , Tania , Elsa so the average is 3666.6667
  • in January 2019 the active employees are Mark , Tania , Elsa so the average is 3666.6667

Following you can see the expected result for the date range 2018-08-01 - 2019-01-31

+------+-------+------------+
| year | month | avg_salary |
+------+-------+------------+
| 2018 | 08    | 3333.33    |
| 2018 | 09    | 3333.33    |
| 2018 | 10    | 3500.00    |
| 2018 | 11    | 3500.00    |
| 2018 | 12    | 3666.67    |
| 2019 | 01    | 3666.67    |
+------+-------+------------+

NOTE: I solved this problem mixing MySQL with PHP code but for big date range it has to execute too many queries (one each month). So I would like to have a solution using MySQL only .

SQL to import structure and data

CREATE TABLE `employees` (
  `id` int(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `start_date` date NOT NULL,
  `end_date` date NOT NULL,
  `salary` decimal(10,2) DEFAULT NULL
);

INSERT INTO `employees` (`id`, `name`, `start_date`, `end_date`, `salary`) VALUES
(1, 'Mark', '2017-05-01', '2020-01-31', '2000.00'),
(2, 'Tania', '2018-02-01', '2019-08-31', '5000.00'),
(3, 'Leo', '2018-02-01', '2018-09-30', '3000.00'),
(4, 'Elsa', '2018-12-01', '2020-05-31', '4000.00');

Here's a MySQL 8.0 recursive CTE way of doing it. The CTE creates a list of all the year, month combinations between the minimum start_date and maximum end_date in the employees table, which is then LEFT JOIN ed to the employees table to get the average salary for all employees who were working in that particular year and month:

WITH RECURSIVE months (year, month) AS
(
  SELECT YEAR(MIN(start_date)) AS year, MONTH(MIN(start_date)) AS month FROM employees
  UNION ALL
  SELECT year + (month = 12), (month % 12) + 1 FROM months
  WHERE STR_TO_DATE(CONCAT_WS('-', year, month, '01'), '%Y-%m-%d') <= (SELECT MAX(end_date) FROM employees)
)
SELECT m.year, m.month, ROUND(AVG(e.salary), 2) AS avg_salary
FROM months m
LEFT JOIN employees e ON STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN e.start_date AND e.end_date
WHERE STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN '2018-08-01' AND '2019-01-31'
GROUP BY m.year, m.month

Output:

year    month   avg_salary
2018    8       3333.33
2018    9       3333.33
2018    10      3500.00
2018    11      3500.00
2018    12      3666.67
2019    1       3666.67

Demo on dbfiddle

You can simply type the desired months (or use PHP code to generate them) and join with it:

SELECT ym, AVG(salary)
FROM (
    SELECT '2018-08-01' + INTERVAL 0 MONTH AS ym UNION ALL
    SELECT '2018-08-01' + INTERVAL 1 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 2 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 3 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 4 MONTH UNION ALL
    SELECT '2018-08-01' + INTERVAL 5 MONTH
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
GROUP BY ym

If you have a table that contains numbers 0, 1, ... then you can use that. You can even use any table that has sufficient number of rows:

SELECT ym, AVG(salary)
FROM (
    SELECT '2018-08-01' + INTERVAL @n := @n + 1 MONTH AS ym
    FROM anytable, (SELECT @n := -1) x
    LIMIT 100
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
WHERE ym <= '2019-01-01'
GROUP BY ym

To get this done, you would need to generate a list of days from a date range. This is a frequently asked question on SO, I used the accepted solution from this post . It uses a simple arithmetic method and can generate wide lists of dates (although performance may suffer).

Then, we just need to JOIN with the original table to compute the average salary at that point of time.

select
  year(x.date), 
  month(x.date),
  avg(coalesce(e.salary, 0)) avg_salary
from (
  select a.date 
  from (
      select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date
      from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
      cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
  ) a
  where a.date between '2018-08-01' and '2019-01-31'
) x left join employees e ON x.date between e.start_date and e.end_date
group by year(x.date), month(x.date)
order by 1, 2

Demo on DB fiddle :

| year(x.date) | month(x.date) | avg_salary  |
| ------------ | ------------- | ----------- |
| 2018         | 8             | 3333.333333 |
| 2018         | 9             | 3333.333333 |
| 2018         | 10            | 3500        |
| 2018         | 11            | 3500        |
| 2018         | 12            | 3666.666667 |
| 2019         | 1             | 3666.666667 |

PS : anoter approach would have been to create a calendar table, that stores the list of days, and then just :

select
  year(x.date), 
  month(x.date),
  avg(coalesce(e.salary, 0)) avg_salary
from 
  mycalendar x
  left join employees e ON x.date between e.start_date and e.end_date
where x.date between '2018-08-01' and '2019-01-31'
group by year(x.date), month(x.date)
order by 1, 2

A partial answer...

Here's an 'old school' solution, using a table of integers (0-9), but note that this kind of thing is redundant in newer versions of sql...

SELECT * FROM ints;
  +---+
  | i |
  +---+
  | 0 |
  | 1 |
  | 2 |
  | 3 |
  | 4 |
  | 5 |
  | 6 |
  | 7 |
  | 8 |
  | 9 |
  +---+

SELECT '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH x 
  FROM ints i1
     , ints i2 
 WHERE '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH BETWEEN '2018-08-01' AND '2019-01-31';

  +------------+
  | x          |
  +------------+
  | 2018-08-01 |
  | 2018-09-01 |
  | 2018-10-01 |
  | 2018-11-01 |
  | 2018-12-01 |
  | 2019-01-01 |
  +------------+

The following is a Postgresql way of doing it. It can be converted to a Mysql query by changing the equivalent of generate_series() link and Extract() in Mysql

WITH cte1 AS
  (SELECT generate_series('2018-08-01', '2019-01-31', '1 month'::interval)::date AS date),
     cte2 AS
  (SELECT id,
          name,
          salary,
          generate_series(start_date, end_date, '1 month'::interval)::date AS date
   FROM employees)
SELECT extract(YEAR
               FROM cte1.date),
       extract(MONTH
               FROM cte1.date),
       avg(salary)
FROM cte1
JOIN cte2 ON extract(MONTH
                     FROM cte1.date)=extract(MONTH
                                             FROM cte2.date)
AND extract(YEAR
            FROM cte1.date)=extract(YEAR
                                    FROM cte2.date)
GROUP BY extract(YEAR
                 FROM cte1.date),
         extract(MONTH
                 FROM cte1.date);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM