Is it possible to make this query faster?

Question

Salutations,

I am quite new to MySQL, especially producing queries, and I was wondering if it is possible to make my query execute faster? I am using the employees db available here: https://github.com/datacharmer/test_db

Now the query I had to produce needed to answer the following: "• For each department, list number of employees born in each decade and their average salaries"

This is what I came up with:

SELECT DISTINCT d.dept_name, count(e.emp_no), AVG(s.salary), ROUND(YEAR(e.birth_date), -1) AS birth_date 
FROM employees e, departments d, salaries s, dept_emp de 
WHERE de.emp_no = e.emp_no AND de.dept_no = d.dept_no 
    AND e.emp_no = s.emp_no 
    GROUP BY d.dept_name, 
    ROUND(YEAR(e.birth_date), -1);

It works, it produces the result the professor wanted, but it is quite slow, taking about 11 seconds to execute. Is there something in my query that makes it slow to execute?

Edit:

Tables described:

mysql> explain dept_emp_latest_date;
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   |     | NULL    |       |
| from_date | date    | YES  |     | NULL    |       |
| to_date   | date    | YES  |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
3 rows in set (0.01 sec)

mysql> explain dept_manager
    -> ;
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   | PRI | NULL    |       |
| dept_no   | char(4) | NO   | PRI | NULL    |       |
| from_date | date    | NO   |     | NULL    |       |
| to_date   | date    | NO   |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> explain employees;
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no     | int(11)       | NO   | PRI | NULL    |       |
| birth_date | date          | NO   |     | NULL    |       |
| first_name | varchar(14)   | NO   |     | NULL    |       |
| last_name  | varchar(16)   | NO   |     | NULL    |       |
| gender     | enum('M','F') | NO   |     | NULL    |       |
| hire_date  | date          | NO   |     | NULL    |       |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

mysql> explain salaries;
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   | PRI | NULL    |       |
| salary    | int(11) | NO   |     | NULL    |       |
| from_date | date    | NO   | PRI | NULL    |       |
| to_date   | date    | NO   |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> explain titles;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| emp_no    | int(11)     | NO   | PRI | NULL    |       |
| title     | varchar(50) | NO   | PRI | NULL    |       |
| from_date | date        | NO   | PRI | NULL    |       |
| to_date   | date        | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> explain departments;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept_no   | char(4)     | NO   | PRI | NULL    |       |
| dept_name | varchar(40) | NO   | UNI | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

mysql> explain current_dept_emp;
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   |     | NULL    |       |
| dept_no   | char(4) | NO   |     | NULL    |       |
| from_date | date    | YES  |     | NULL    |       |
| to_date   | date    | YES  |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.02 sec)

mysql> explain dept_emp;
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   | PRI | NULL    |       |
| dept_no   | char(4) | NO   | PRI | NULL    |       |
| from_date | date    | NO   |     | NULL    |       |
| to_date   | date    | NO   |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)

Answer 1

Here's your query refactored to use 21st-century JOIN syntax.

SELECT DISTINCT d.dept_name, count(e.emp_no), AVG(s.salary),
       ROUND(YEAR(e.birth_date), -1) AS birth_date 
  FROM employees e
  JOIN salaries s ON e.emp_no = s.emp_no
  JOIN dept_emp de  ON de.emp_no = e.emp_no 
  JOIN departments d ON de.dept_no = d.dept_no
 GROUP BY d.dept_name, ROUND(YEAR(e.birth_date), -1);

Notice that DISTINCT is redundant in an aggregate (GROUP BY) query. Getting rid of it shaves a couple of seconds.

But notice that the salaries table contains historical salary data. Each row contains a from_date and to_date . The from_date column is part of the primary key of that table along with the employee number. So your query averages a whole bunch of salary data indiscriminately, and pulls in far too many records.

This query takes 4.6 seconds or so (my machine is about the same speed as yours, taking 11 seconds for your first query). And it makes more sense with the data you're given, because it extracts the salary records and department affiliation records for a particular point in time, rather than processing the whole lot.

SELECT d.dept_name, COUNT(e.emp_no), AVG(s.salary),
       ROUND(YEAR(e.birth_date), -1) AS birth_date
  FROM employees e
  JOIN salaries s ON e.emp_no = s.emp_no
  JOIN dept_emp de ON de.emp_no = e.emp_no
  JOIN departments d ON de.dept_no = d.dept_no
 WHERE s.from_date<='2014-01-01' AND s.to_date >'2014-01-01'
   AND de.from_date<='2014-01-01' AND de.to_date >'2014-01-01'
 GROUP BY d.dept_name, ROUND(YEAR(e.birth_date), -1);

It's working on a quarter-million employee records, so it's handling 52 records per millisecond. Not bad for a laptop.

Is it possible to make this query faster?

Question

1 answers

solution1
1 ACCPTED 2017-02-15 02:22:38

Is it possible to make this query faster?

Question

1 answers

solution1 1 ACCPTED 2017-02-15 02:22:38

solution1
1 ACCPTED 2017-02-15 02:22:38