简体   繁体   English

MYSQL - 选择两个表之间的唯一公共列 - 最有效的查询

[英]MYSQL - Select Unique Common Columns between two tables - Most Efficient Query

I have two tables: 我有两张桌子:

db_contacts db_contacts

Phone | Name | Last_Name
--------------------
111   | Foo  | Foo
222   | Bar  | Bar
333   | John | Smith
444   | Tomy | Smith

users_contacts users_contacts

User_ID | Phone
--------------------
1       | 123
1       | 111
2       | 222
2       | 333
3       | 111
3       | 333
4       | 444

Notice from above that: 从上面注意到:

  • User with ID 2 is the only one that have the phone number 222 ID为2的用户是唯一拥有电话号码222的用户
  • User with ID 4 is the only one that have the phone number 444 ID为4的用户是唯一拥有电话号码444的用户

I need to obtain these results with a MySQL query. 我需要通过MySQL查询获得这些结果。

In other words: How can I select all the users that have a unique phone number in condition that this number exists in the db_contacts. 换句话说:如果db_contacts中存在此编号,我该如何选择具有唯一电话号码的所有用户。

I need my end result to be something like that: 我需要我的最终结果是这样的:

User_ID | Phone | Name | Last_Name
------------------------------------
2       | 222   | Bar  | Bar
4       | 444   | Tomy | Smith

PS: There is no Foreign key between the Phone columns, as a User can have a phone that is not in the db_contacts. PS:电话列之间没有外键,因为用户可以拥有不在db_contacts中的电话。

In real life, db_contacts contains about 1 million records and users_contacts about 5 million records. 在现实生活中,db_contacts包含大约100万条记录,users_contacts包含大约500万条记录。

What I tried and failed and taking a lot of time to execute: 我尝试过但失败了并花了很多时间来执行:

SELECT * 
FROM users_contacts 
WHERE users_contacts.phone IN (
    SELECT users_contacts.phone 
    FROM `users_contacts`
    JOIN db_contacts ON db_contacts.phone = users_contacts.phone
    GROUP BY users_contacts.phone
    HAVING COUNT(users_contacts.phone) = 1
)

Update: 更新:

Thank you for your replies, I have provided my solution that fits my case perfectly. 感谢您的回复,我提供的解决方案完全符合我的要求。

I think you want: 我想你想要:

select uc.*
from user_contacts uc
where not exists (select 1
                  from user_contacts uc2
                  where uc2.phone = uc.phone and uc2.user_id <> uc.user_id
                 );

For performance, you want an index on user_contacts(phone, user_id) . 为了提高性能,您需要user_contacts(phone, user_id)上的索引。

Another method is: 另一种方法是:

select max(user_id) as user_id, phone
from user_contacts
group by phone
having count(*) = 1;

The not exists version is probably going to be faster. not exists版本可能会更快。

I would use a simple JOIN with a NOT EXISTS condition. 我会使用一个简单的JOINNOT EXISTS条件。 This is usually the most efficient way to check that something has no duplicates ; 这通常是检查某些东西没有重复的最有效方法; compared to your solution, this has the advantage of avoiding aggregation. 与您的解决方案相比,这具有避免聚合的优势。

SELECT uc.User_ID, dc.*
FROM users_contacts uc
INNER JOIN db_contacts dc ON uc.Phone = dc.Phone
WHERE NOT EXISTS (
    SELECT 1 
    FROM users_contacts uc1 
    WHERE uc1.Phone = dc.Phone AND uc1.User_ID != uc2.User_ID
)

Hint: consider setting the following indexes: 提示:考虑设置以下索引:

  • users_contacts(Phone, User_ID)
  • db_contacts(Phone)

I first would like to thank everyone that posted solutions, they all worked. 我首先要感谢所有发布解决方案的人,他们都工作了。

But I was a bit crucial on response times, and solutions provided by the fellows took a lot of time to execute, couple of seconds. 但是我对响应时间有点关键,并且研究员提供的解决方案花费了大量时间来执行,几秒钟。

In case anyone was having a similar problem, I ended up by creating a new table calling it users_unique_contacts, and created a trigger AFTER INSERT on users_contacts that checks if the newly created contact existed in the users_unique_contacts, if it didn't exist, add it, else remove it as it means the number is not unique anymore. 如果有人遇到类似问题,我最后创建了一个名为users_unique_contacts的新表,并在users_contacts上创建了一个触发器AFTER INSERT,用于检查users_unique_contacts中是否存在新创建的联系人,如果不存在则添加它,否则删除它,因为这意味着该数字不再是唯一的。

My Trigger went like this: 我的触发器是这样的:

BEGIN
    IF EXISTS (SELECT 1 = 1 FROM users_unique_contacts WHERE phone = new.phone LIMIT 1) THEN
        BEGIN
                DELETE FROM users_unique_contacts WHERE phone = new.phone LIMIT 1;
        END;
    ELSE
        BEGIN
                INSERT INTO users_unique_contacts (user_id,phone) VALUES (new.user_id, new.phone);
        END;
    END IF;
END

Now everytime I want the unique numbers of a user, I query the users_unique_contacts and execution time is milliseconds. 现在,每当我想要用户的唯一编号时,我查询users_unique_contacts,执行时间是毫秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM