简体   繁体   中英

MySQL: Get data from another table without duplicates

I am a beginner in MySQL and I am having a bit of a problem. I have two tables. One of them is called core and has 200.000 entries and contains the column messageid . Another table is called recipients , has 1.200.000 entries and has the columns recipientid , messageid and personid .

I am creating a new column personid in the table core in order import the personid entries where the messageid in both columns are the same. I know that in the table recipients , there are multiple entries for several messageid 's, so I just want to return the first one. I am using the following sql query:

UPDATE core A
SET personid =
(SELECT personid
FROM recipients B
WHERE B.messageid = A.messageid)

I do not understand why, but it does not work. I works when I want to import data from other tables, but not this one. Ultimately, it just crashes my local php/mysql server.

Would you have any idea on how to do that?

Update:

Here is the result of SHOW INDEX FROM recipients. The column name "recipientid" has a cardinality of 1356207 and the indextype is BTREE. The column name "messageid" has a cardinality of "NULL" and the indextype is BTREE.

When I run SHOW INDEX FROM core, the query runs successfully, but nothing is displayed. Does it mean there is a problem?

Part 0 - Before you do anything else...

... you need to be able to perform queries on core without them either timing out or crashing the server. Based on your updated question, you're probably going to have to create some indexes on core to help speed up queries on it (otherwise the database has to scan the entire table to make sure it's doing the right thing). At the very minimum, an index on messageid should help:

ALTER TABLE core ADD INDEX messageid_idx(messageid);

This index on messageid should help speed up any queries that operate upon that column, like the update query.

The rest of my answer will come in two parts. First, the answer to your question, where you are sure you want to ignore multiple personid 's per messageid in recipients .

The second part of my answer contains additional queries that I would perform upfront to analyze the multiple personid 's to ascertain whether or not they really matter. Perhaps you're really sure that the multiple personid 's per messageid is no big deal, but then hopefully this will benefit others seeking answers to questions similar to yours.

Part 1: Just any personid will do, thanks

For recipients with more than one personid per messageid , we'll just let the database engine decide which personid to use for update:

-- Update core personid with recipients personid where messageid matches
UPDATE
    core
    INNER JOIN (
        SELECT
            messageid,
            personid
        FROM
            recipients
        GROUP BY
            messageid
    ) AS one_message_per_person USING (messageid)
SET
    core.personid = one_message_per_person.personid

Simple enough, right? Ok, let's move on.

Part 2: Actually, I'm interested in those messages with multiple people. Which ones are they?

If you just want to know which messageid 's have multiple personid 's associated with them, you can query them with this:

-- Find messageids with more than one personid
SELECT
    messageid,
    personid,
    COUNT(DISTINCT personid) AS num_people_in_message
FROM
    recipients
GROUP BY
    messageid
HAVING
    num_people_in_message > 1

If you also want to see the individual personid 's associated with them, you can use the following query. Note, there will be duplicate messageid 's, one for each indvidual personid :

-- Show the messageid and personid of the messages with multiple people
    SELECT
    messageid,
    personid
FROM
    recipients
WHERE
    messageid IN (
        SELECT
            messageid
        FROM
            (
                SELECT
                    messageid,
                    personid,
                    COUNT(DISTINCT personid) AS num_people_in_message
                FROM
                    recipients
                GROUP BY
                    messageid
                HAVING
                    num_people_in_message > 1
            ) AS messages_with_multiple_people
    )
GROUP BY
    messageid, personid

Did I understand your question correctly and provide the answer you needed? I hope so. Many thanks to sqlfiddle for helping me make sure those queries would actually work for you.

我认为您不能在MySQL的一个查询中使用选择和更新,我的建议是使用过程来达到相同的目的

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM