简体   繁体   English

我怎样才能加快这个连接表本身的查询?

[英]How can I speed up this query that joins a table on itself?

We have a `users' table that holds information about our users.我们有一个“用户”表,用于保存有关我们用户的信息。 One of the fields within this table is called 'query'.此表中的字段之一称为“查询”。 I am trying to SELECT the user id's of all users that have the same query.我正在尝试选择具有相同查询的所有用户的用户 ID。 So my output should look like this:所以我的输出应该是这样的:

user1_id    user2_id    common_query
   43          2            "foo"
   117         433          "bar"
   1           119          "baz"
   1           52           "qux"

Unfortunately, I can't get this query to finish in under an hour (the users table is pretty big).不幸的是,我无法在一小时内完成此查询(用户表非常大)。 This is my current query:这是我当前的查询:

SELECT u1.id,
       u2.id,
       u1.query
FROM users u1
INNER JOIN users u2
        ON u1.query = u2.query
       AND u1.id <> u2.id

My explain:我的解释:

+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
| id | select_type | table | type  | possible_keys        | key                  | key_len | ref                             | rows     | Extra                    |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
|  1 | SIMPLE      | u1    | index | index_users_on_query | index_users_on_query | 768     | NULL                            | 10905267 | Using index              |
|  1 | SIMPLE      | u2    | ref   | index_users_on_query | index_users_on_query | 768     | u1.query                        |       11 | Using where; Using index |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+

As you can see from the explain, the users table is indexed on query and the index appears to be being used in my SELECT.正如您从解释中看到的,users 表在查询时建立了索引,并且该索引似乎正在我的 SELECT 中使用。 I'm wondering why the 'rows' column on table u2 has a value of 11, and not 1. Is there anything I can do to speed this query up?我想知道为什么表 u2 上的 'rows' 列的值为 11,而不是 1。有什么我可以做的来加速这个查询吗? Is my '<>' comparison within the join bad practice?我的“<>”比较是否在 join 不好的做法中? Also, the id field is the primary key此外,id 字段是主键

The main driver of the query is the equality on the query field--if it's indexed.查询的主要驱动因素是query字段的相等性——如果它被索引。 The <> to the id is probably not very specific and it shows by the type of select being used for it is 'ref' id的 <> 可能不是很具体,它通过用于它的选择类型显示为 'ref'

Below only applies if 'query' is not indexed....以下仅适用于“查询”未编入索引的情况....

If id is the primary key you could just do this:如果id是主键,你可以这样做:

CREATE INDEX index_1  ON users (query);

The result of adding such an index will be a covering index for the query and will result in the fastest execution for the query.添加此类索引的结果将是查询的覆盖索引,并将导致查询的最快执行。

My biggest concern is the key_len , which indicates that MySQL must compare up to 768 bytes in order to lookup each index entry.我最关心的是key_len ,它表明 MySQL 必须比较最多 768 个字节才能查找每个索引条目。

For this query, a hash index on query could be much more performant (as it would involve substantially shorter comparisons, at the cost of calculating hashes and being unable to sort records using that index):对于此查询,对哈希索引query可能会更加高性能的(因为它会涉及短得多的比较,在计算哈希值,并使用该指数是无法排序记录的费用):

ALTER TABLE users ADD INDEX (query) USING HASH

You might also consider making this a composite on (query, id) so that MySQL need not scan into the record itself to test the <> criterion.您还可以考虑将其作为(query, id)的组合(query, id)以便 MySQL 无需扫描到记录本身来测试<>标准。

How many queries do you have?你有多少查询? You can add table UsersInQueries:您可以添加表 UsersInQueries:

id   queryId   userId
0      5         453   
1      23        732 
2      15        761

then select from this table and group by queryId然后从此表中选择并按 queryId 分组

If you only have up to two users per query, you could do this instead:如果每个查询最多只有两个用户,则可以改为执行以下操作:

select query, min(id) as FirstID, max(id) as SecondId
from users
group by query
having count(*) > 1

If you have more than two users with the same query, can you explain why you would want all pairs of such users?如果你有两个以上的用户使用相同的查询,你能解释为什么你想要所有这样的用户对吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM