I have the employees table where each employee has a related start_date , end_date and a salary .
NOTE: on the bottom you can find the SQL code to import the structure and data.
+----+-------+------------+------------+---------+
| id | name | start_date | end_date | salary |
+----+-------+------------+------------+---------+
| 1 | Mark | 2017-05-01 | 2020-01-31 | 2000.00 |
| 2 | Tania | 2018-02-01 | 2019-08-31 | 5000.00 |
| 3 | Leo | 2018-02-01 | 2018-09-30 | 3000.00 |
| 4 | Elsa | 2018-12-01 | 2020-05-31 | 4000.00 |
+----+-------+------------+------------+---------+
For a given date range I want to extract the average of the salaries for each month within the given date range.
UPDATE: I would like to have the solution for MySQL 5.6 but it would be great to have also the solution for MySQL 8+ (just for personal knowledge).
If the date range is 2018-08-01 - 2019-01-31 , the SQL statement should loop from August 2018 to January 2019 and it has to calculate the average salary for each month:
Following you can see the expected result for the date range 2018-08-01 - 2019-01-31
+------+-------+------------+
| year | month | avg_salary |
+------+-------+------------+
| 2018 | 08 | 3333.33 |
| 2018 | 09 | 3333.33 |
| 2018 | 10 | 3500.00 |
| 2018 | 11 | 3500.00 |
| 2018 | 12 | 3666.67 |
| 2019 | 01 | 3666.67 |
+------+-------+------------+
NOTE: I solved this problem mixing MySQL with PHP code but for big date range it has to execute too many queries (one each month). So I would like to have a solution using MySQL only .
CREATE TABLE `employees` (
`id` int(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`salary` decimal(10,2) DEFAULT NULL
);
INSERT INTO `employees` (`id`, `name`, `start_date`, `end_date`, `salary`) VALUES
(1, 'Mark', '2017-05-01', '2020-01-31', '2000.00'),
(2, 'Tania', '2018-02-01', '2019-08-31', '5000.00'),
(3, 'Leo', '2018-02-01', '2018-09-30', '3000.00'),
(4, 'Elsa', '2018-12-01', '2020-05-31', '4000.00');
Here's a MySQL 8.0 recursive CTE way of doing it. The CTE creates a list of all the year, month
combinations between the minimum start_date
and maximum end_date
in the employees
table, which is then LEFT JOIN
ed to the employees
table to get the average salary for all employees who were working in that particular year and month:
WITH RECURSIVE months (year, month) AS
(
SELECT YEAR(MIN(start_date)) AS year, MONTH(MIN(start_date)) AS month FROM employees
UNION ALL
SELECT year + (month = 12), (month % 12) + 1 FROM months
WHERE STR_TO_DATE(CONCAT_WS('-', year, month, '01'), '%Y-%m-%d') <= (SELECT MAX(end_date) FROM employees)
)
SELECT m.year, m.month, ROUND(AVG(e.salary), 2) AS avg_salary
FROM months m
LEFT JOIN employees e ON STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN e.start_date AND e.end_date
WHERE STR_TO_DATE(CONCAT_WS('-', m.year, m.month, '01'), '%Y-%m-%d') BETWEEN '2018-08-01' AND '2019-01-31'
GROUP BY m.year, m.month
Output:
year month avg_salary
2018 8 3333.33
2018 9 3333.33
2018 10 3500.00
2018 11 3500.00
2018 12 3666.67
2019 1 3666.67
You can simply type the desired months (or use PHP code to generate them) and join with it:
SELECT ym, AVG(salary)
FROM (
SELECT '2018-08-01' + INTERVAL 0 MONTH AS ym UNION ALL
SELECT '2018-08-01' + INTERVAL 1 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 2 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 3 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 4 MONTH UNION ALL
SELECT '2018-08-01' + INTERVAL 5 MONTH
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
GROUP BY ym
If you have a table that contains numbers 0, 1, ... then you can use that. You can even use any table that has sufficient number of rows:
SELECT ym, AVG(salary)
FROM (
SELECT '2018-08-01' + INTERVAL @n := @n + 1 MONTH AS ym
FROM anytable, (SELECT @n := -1) x
LIMIT 100
) AS yearmonths
JOIN employees ON ym BETWEEN start_date AND end_date
WHERE ym <= '2019-01-01'
GROUP BY ym
To get this done, you would need to generate a list of days from a date range. This is a frequently asked question on SO, I used the accepted solution from this post . It uses a simple arithmetic method and can generate wide lists of dates (although performance may suffer).
Then, we just need to JOIN with the original table to compute the average salary at that point of time.
select
year(x.date),
month(x.date),
avg(coalesce(e.salary, 0)) avg_salary
from (
select a.date
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
where a.date between '2018-08-01' and '2019-01-31'
) x left join employees e ON x.date between e.start_date and e.end_date
group by year(x.date), month(x.date)
order by 1, 2
| year(x.date) | month(x.date) | avg_salary |
| ------------ | ------------- | ----------- |
| 2018 | 8 | 3333.333333 |
| 2018 | 9 | 3333.333333 |
| 2018 | 10 | 3500 |
| 2018 | 11 | 3500 |
| 2018 | 12 | 3666.666667 |
| 2019 | 1 | 3666.666667 |
PS : anoter approach would have been to create a calendar table, that stores the list of days, and then just :
select
year(x.date),
month(x.date),
avg(coalesce(e.salary, 0)) avg_salary
from
mycalendar x
left join employees e ON x.date between e.start_date and e.end_date
where x.date between '2018-08-01' and '2019-01-31'
group by year(x.date), month(x.date)
order by 1, 2
A partial answer...
Here's an 'old school' solution, using a table of integers (0-9), but note that this kind of thing is redundant in newer versions of sql...
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH x
FROM ints i1
, ints i2
WHERE '2018-08-01' + INTERVAL i2.i * 10 + i1.i MONTH BETWEEN '2018-08-01' AND '2019-01-31';
+------------+
| x |
+------------+
| 2018-08-01 |
| 2018-09-01 |
| 2018-10-01 |
| 2018-11-01 |
| 2018-12-01 |
| 2019-01-01 |
+------------+
The following is a Postgresql way of doing it. It can be converted to a Mysql query by changing the equivalent of generate_series()
link and Extract()
in Mysql
WITH cte1 AS
(SELECT generate_series('2018-08-01', '2019-01-31', '1 month'::interval)::date AS date),
cte2 AS
(SELECT id,
name,
salary,
generate_series(start_date, end_date, '1 month'::interval)::date AS date
FROM employees)
SELECT extract(YEAR
FROM cte1.date),
extract(MONTH
FROM cte1.date),
avg(salary)
FROM cte1
JOIN cte2 ON extract(MONTH
FROM cte1.date)=extract(MONTH
FROM cte2.date)
AND extract(YEAR
FROM cte1.date)=extract(YEAR
FROM cte2.date)
GROUP BY extract(YEAR
FROM cte1.date),
extract(MONTH
FROM cte1.date);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.