简体   繁体   中英

MySQL: Using COUNT(column_name) in the column list, and again in the HAVING clause. Does this cause the COUNT(column_name) operation to run twice?

I am curious about the performance of using COUNT(column_name) twice in a single query. Here is the query in question:

SELECT
    employee_name,
    COUNT(employee_name)
FROM
    employee
GROUP BY
    employee_name
HAVING
    COUNT(employee_name) > 1;

Will

COUNT(employee_name)

be executed twice? Furthermore, how can I check for myself the performance of what is going on under the covers when I have questions like this in the future?

Thanks!

You can use optimizer trace to get more knowledge about how the optimizer executes the query and why. For this particular case, the trace does not explicitly tell how many times the count is computed, but we can get information about the temporary table that is used to perform the aggregation:


mysql> SET optimizer_trace='enabled=on';                                               
Query OK, 0 rows affected (0,00 sec)

mysql> SELECT c2, COUNT(c2) FROM temp GROUP BY c2 HAVING COUNT(c2) > 1;
+------+-----------+
| c2   | COUNT(c2) |
+------+-----------+
|    1 |         2 |
|    2 |         2 |
+------+-----------+
2 rows in set (0,00 sec)

mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'
    -> FROM information_schema.optimizer_trace;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 23, "unique_constraint": false, "row_limit_estimate": 729444}}] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,01 sec)

mysql> SELECT c2, COUNT(c2) AS c FROM temp GROUP BY c2 HAVING c > 1;
+------+---+
| c2   | c |
+------+---+
|    1 | 2 |
|    2 | 2 |
+------+---+
2 rows in set (0,00 sec)

mysql> SELECT trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'           -> FROM information_schema.optimizer_trace;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| trace->'$.steps[*].join_execution.steps[*].creating_tmp_table'                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"tmp_table_info": {"table": "intermediate_tmp_table", "location": "memory (heap)", "key_length": 5, "row_length": 14, "unique_constraint": false, "row_limit_estimate": 1198372}}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

For the above we see that row size for the temporary table is smaller (14 vs 23 bytes) when an alias is used instead of repeating the COUNT expression. This indicates that for your query the counting is done twice during aggregation.

Pick any handy table and do this:

mysql> SELECT RAND() AS r FROM canada HAVING r < 0.1 limit 11;
+-----------------------+
| r                     |
+-----------------------+
|    0.6982369559800596 |
|   0.33121224616767114 |
|    0.3811396559524719 |
|    0.4718028721136999 |

See also:

Using `rand()` with `having`

Is there Performance related difference in using aggregate function in ORDER BY clause and alias of aggregate function?

And I think there are other discussions involving non-RAND cases.

The original question uses COUNT(employee_name) , which delivers the same value in both situations. So, you can't really tell if it was 'evaluated' twice. By using RAND() , it becomes clear that it is reevaluated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM