[英]MySQL query with JOIN and GROUP BY optimization. Is it possible?
I have two tables: gpnxuser and key_value 我有两个表:gpnxuser和key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and 和
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. key_value.user_id是引用gpnxuser.id的FK。 I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
我在gpnxuser.partner_id中也有一个索引,它是一个引用名为“partner”的表的FK(我认为这对这个问题并不重要)。
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value. 对于partner_id = 64,我在gpnxuser中有500K行,它与key_value中的大约6M行有关。
I wanted to have a query that returned all distinct 'key_value.upkey' for user´s belonging to a given partner. 我希望有一个查询返回属于给定合作伙伴的用户的所有不同的'key_value.upkey'。 I did something like this:
我做了这样的事情:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. 这需要永远运行。 The explain for the query looks like:
查询的解释如下:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want? 我的问题是:是否有一个可以快速运行并获得我想要的结果的查询?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more. 您需要做的是使用EXISTS语句:这将导致仅部分表扫描,直到找到匹配,而不是更多。
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. NB。 In the original query, group by is misused: distinct looks better there.
在原始查询中, group by被滥用: distinct在那里看起来更好。
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value
table on user_id
, if you typically run queries based on this column. 如果您通常基于此列运行查询,我会考虑在
user_id
上对key_value
表进行分区。
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.