We have a `users' table that holds information about our users. One of the fields within this table is called 'query'. I am trying to SELECT the user id's of all users that have the same query. So my output should look like this:
user1_id user2_id common_query
43 2 "foo"
117 433 "bar"
1 119 "baz"
1 52 "qux"
Unfortunately, I can't get this query to finish in under an hour (the users table is pretty big). This is my current query:
SELECT u1.id,
u2.id,
u1.query
FROM users u1
INNER JOIN users u2
ON u1.query = u2.query
AND u1.id <> u2.id
My explain:
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
| 1 | SIMPLE | u1 | index | index_users_on_query | index_users_on_query | 768 | NULL | 10905267 | Using index |
| 1 | SIMPLE | u2 | ref | index_users_on_query | index_users_on_query | 768 | u1.query | 11 | Using where; Using index |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
As you can see from the explain, the users table is indexed on query and the index appears to be being used in my SELECT. I'm wondering why the 'rows' column on table u2 has a value of 11, and not 1. Is there anything I can do to speed this query up? Is my '<>' comparison within the join bad practice? Also, the id field is the primary key
The main driver of the query is the equality on the query
field--if it's indexed. The <> to the id
is probably not very specific and it shows by the type of select being used for it is 'ref'
Below only applies if 'query' is not indexed....
If id
is the primary key you could just do this:
CREATE INDEX index_1 ON users (query);
The result of adding such an index will be a covering index for the query and will result in the fastest execution for the query.
My biggest concern is the key_len
, which indicates that MySQL must compare up to 768 bytes in order to lookup each index entry.
For this query, a hash index on query
could be much more performant (as it would involve substantially shorter comparisons, at the cost of calculating hashes and being unable to sort records using that index):
ALTER TABLE users ADD INDEX (query) USING HASH
You might also consider making this a composite on (query, id)
so that MySQL need not scan into the record itself to test the <>
criterion.
How many queries do you have? You can add table UsersInQueries:
id queryId userId
0 5 453
1 23 732
2 15 761
then select from this table and group by queryId
If you only have up to two users per query, you could do this instead:
select query, min(id) as FirstID, max(id) as SecondId
from users
group by query
having count(*) > 1
If you have more than two users with the same query, can you explain why you would want all pairs of such users?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.