[英]MYSQL - Select Unique Common Columns between two tables - Most Efficient Query
I have two tables: 我有两张桌子:
db_contacts db_contacts
Phone | Name | Last_Name
--------------------
111 | Foo | Foo
222 | Bar | Bar
333 | John | Smith
444 | Tomy | Smith
users_contacts users_contacts
User_ID | Phone
--------------------
1 | 123
1 | 111
2 | 222
2 | 333
3 | 111
3 | 333
4 | 444
Notice from above that: 从上面注意到:
I need to obtain these results with a MySQL query. 我需要通过MySQL查询获得这些结果。
In other words: How can I select all the users that have a unique phone number in condition that this number exists in the db_contacts. 换句话说:如果db_contacts中存在此编号,我该如何选择具有唯一电话号码的所有用户。
I need my end result to be something like that: 我需要我的最终结果是这样的:
User_ID | Phone | Name | Last_Name
------------------------------------
2 | 222 | Bar | Bar
4 | 444 | Tomy | Smith
PS: There is no Foreign key between the Phone columns, as a User can have a phone that is not in the db_contacts. PS:电话列之间没有外键,因为用户可以拥有不在db_contacts中的电话。
In real life, db_contacts contains about 1 million records and users_contacts about 5 million records. 在现实生活中,db_contacts包含大约100万条记录,users_contacts包含大约500万条记录。
What I tried and failed and taking a lot of time to execute: 我尝试过但失败了并花了很多时间来执行:
SELECT *
FROM users_contacts
WHERE users_contacts.phone IN (
SELECT users_contacts.phone
FROM `users_contacts`
JOIN db_contacts ON db_contacts.phone = users_contacts.phone
GROUP BY users_contacts.phone
HAVING COUNT(users_contacts.phone) = 1
)
Thank you for your replies, I have provided my solution that fits my case perfectly. 感谢您的回复,我提供的解决方案完全符合我的要求。
I think you want: 我想你想要:
select uc.*
from user_contacts uc
where not exists (select 1
from user_contacts uc2
where uc2.phone = uc.phone and uc2.user_id <> uc.user_id
);
For performance, you want an index on user_contacts(phone, user_id)
. 为了提高性能,您需要
user_contacts(phone, user_id)
上的索引。
Another method is: 另一种方法是:
select max(user_id) as user_id, phone
from user_contacts
group by phone
having count(*) = 1;
The not exists
version is probably going to be faster. not exists
版本可能会更快。
I would use a simple JOIN
with a NOT EXISTS
condition. 我会使用一个简单的
JOIN
和NOT EXISTS
条件。 This is usually the most efficient way to check that something has no duplicates ; 这通常是检查某些东西没有重复的最有效方法; compared to your solution, this has the advantage of avoiding aggregation.
与您的解决方案相比,这具有避免聚合的优势。
SELECT uc.User_ID, dc.*
FROM users_contacts uc
INNER JOIN db_contacts dc ON uc.Phone = dc.Phone
WHERE NOT EXISTS (
SELECT 1
FROM users_contacts uc1
WHERE uc1.Phone = dc.Phone AND uc1.User_ID != uc2.User_ID
)
Hint: consider setting the following indexes: 提示:考虑设置以下索引:
users_contacts(Phone, User_ID)
db_contacts(Phone)
I first would like to thank everyone that posted solutions, they all worked. 我首先要感谢所有发布解决方案的人,他们都工作了。
But I was a bit crucial on response times, and solutions provided by the fellows took a lot of time to execute, couple of seconds. 但是我对响应时间有点关键,并且研究员提供的解决方案花费了大量时间来执行,几秒钟。
In case anyone was having a similar problem, I ended up by creating a new table calling it users_unique_contacts, and created a trigger AFTER INSERT on users_contacts that checks if the newly created contact existed in the users_unique_contacts, if it didn't exist, add it, else remove it as it means the number is not unique anymore. 如果有人遇到类似问题,我最后创建了一个名为users_unique_contacts的新表,并在users_contacts上创建了一个触发器AFTER INSERT,用于检查users_unique_contacts中是否存在新创建的联系人,如果不存在则添加它,否则删除它,因为这意味着该数字不再是唯一的。
My Trigger went like this: 我的触发器是这样的:
BEGIN
IF EXISTS (SELECT 1 = 1 FROM users_unique_contacts WHERE phone = new.phone LIMIT 1) THEN
BEGIN
DELETE FROM users_unique_contacts WHERE phone = new.phone LIMIT 1;
END;
ELSE
BEGIN
INSERT INTO users_unique_contacts (user_id,phone) VALUES (new.user_id, new.phone);
END;
END IF;
END
Now everytime I want the unique numbers of a user, I query the users_unique_contacts and execution time is milliseconds. 现在,每当我想要用户的唯一编号时,我查询users_unique_contacts,执行时间是毫秒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.