Salutations,
I am quite new to MySQL, especially producing queries, and I was wondering if it is possible to make my query execute faster? I am using the employees db available here: https://github.com/datacharmer/test_db
Now the query I had to produce needed to answer the following: "• For each department, list number of employees born in each decade and their average salaries"
This is what I came up with:
SELECT DISTINCT d.dept_name, count(e.emp_no), AVG(s.salary), ROUND(YEAR(e.birth_date), -1) AS birth_date
FROM employees e, departments d, salaries s, dept_emp de
WHERE de.emp_no = e.emp_no AND de.dept_no = d.dept_no
AND e.emp_no = s.emp_no
GROUP BY d.dept_name,
ROUND(YEAR(e.birth_date), -1);
It works, it produces the result the professor wanted, but it is quite slow, taking about 11 seconds to execute. Is there something in my query that makes it slow to execute?
Edit:
Tables described:
mysql> explain dept_emp_latest_date;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | | NULL | |
| from_date | date | YES | | NULL | |
| to_date | date | YES | | NULL | |
+-----------+---------+------+-----+---------+-------+
3 rows in set (0.01 sec)
mysql> explain dept_manager
-> ;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| dept_no | char(4) | NO | PRI | NULL | |
| from_date | date | NO | | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> explain employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> explain salaries;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| salary | int(11) | NO | | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> explain titles;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| title | varchar(50) | NO | PRI | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> explain departments;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept_no | char(4) | NO | PRI | NULL | |
| dept_name | varchar(40) | NO | UNI | NULL | |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.01 sec)
mysql> explain current_dept_emp;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | | NULL | |
| dept_no | char(4) | NO | | NULL | |
| from_date | date | YES | | NULL | |
| to_date | date | YES | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.02 sec)
mysql> explain dept_emp;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| dept_no | char(4) | NO | PRI | NULL | |
| from_date | date | NO | | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Here's your query refactored to use 21st-century JOIN syntax.
SELECT DISTINCT d.dept_name, count(e.emp_no), AVG(s.salary),
ROUND(YEAR(e.birth_date), -1) AS birth_date
FROM employees e
JOIN salaries s ON e.emp_no = s.emp_no
JOIN dept_emp de ON de.emp_no = e.emp_no
JOIN departments d ON de.dept_no = d.dept_no
GROUP BY d.dept_name, ROUND(YEAR(e.birth_date), -1);
Notice that DISTINCT
is redundant in an aggregate (GROUP BY) query. Getting rid of it shaves a couple of seconds.
But notice that the salaries
table contains historical salary data. Each row contains a from_date
and to_date
. The from_date
column is part of the primary key of that table along with the employee number. So your query averages a whole bunch of salary data indiscriminately, and pulls in far too many records.
This query takes 4.6 seconds or so (my machine is about the same speed as yours, taking 11 seconds for your first query). And it makes more sense with the data you're given, because it extracts the salary records and department affiliation records for a particular point in time, rather than processing the whole lot.
SELECT d.dept_name, COUNT(e.emp_no), AVG(s.salary),
ROUND(YEAR(e.birth_date), -1) AS birth_date
FROM employees e
JOIN salaries s ON e.emp_no = s.emp_no
JOIN dept_emp de ON de.emp_no = e.emp_no
JOIN departments d ON de.dept_no = d.dept_no
WHERE s.from_date<='2014-01-01' AND s.to_date >'2014-01-01'
AND de.from_date<='2014-01-01' AND de.to_date >'2014-01-01'
GROUP BY d.dept_name, ROUND(YEAR(e.birth_date), -1);
It's working on a quarter-million employee records, so it's handling 52 records per millisecond. Not bad for a laptop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.