简体   繁体   中英

How to select only the unique rows after joining two table

i have two tables:

table 'A'
| id | name  |
| 1  | Larry |
| 2  | Maria |
| 3  | Ponyo |
| 4  | Panda |


table 'B'
| m_id | items |
| 1    |  7    |
| 2    |  9    |

I just want to display the records from table 'A' that are NOT on table 'B'. So that would be

| 3  | Ponyo |
| 4  | Panda |

only.

An anti-join pattern is usually the most efficient approach, although there are several ways to get the same result.

SELECT a.id
     , a.name
  FROM table_a a
  LEFT
  JOIN table_b b
    ON b.id = a.id
 WHERE b.id IS NULL

Let's unpack that a bit.

The LEFT [OUTER] JOIN operation gets us all rows from a , along with matching rows from b . The "trick" is to filter out all the rows that had a match; to do that, we use a predicate in the WHERE clause, that checks for a NULL value from b that we know won't be NULL if a match was found.

In this case, if we found a match, we know b.id is not null, since b.id = a.id wouldn't return TRUE if b.id was NULL.

The anti-join won't create any "duplicate" rows from a (like a regular join can do). If you need to eliminate "duplicates" that already exist in a, adding a GROUP BY clause or adding the DISTINCT keyword before the select list is the way to go.


There are other approaches, like using a NOT EXISTS predicate with a correlated subquery, or a NOT IN with a subquery, but those forms are usually not as efficient.


FOLLOWUP

Actual performance of the queries is going to depend on several factors; having suitable indexes available is probably the biggest factor. The nullability of columns involved in predicates plays a role in the execution plan, as does cardinality, distribution of values, etc., MySQL version, and configuration of the server (eg innodb pool size)

As a test case:

SHOW VARIABLES LIKE 'version'
-- Variable_name  Value                        
-- -------------  -----------------------------
-- version        5.5.35-0ubuntu0.12.04.2-log  

CREATE TABLE `table_a` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name_` VARCHAR(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=INNODB;

CREATE TABLE `table_b` (
  `a_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  `item` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`a_id`,`item`)
) ENGINE=INNODB;

-- table_a  1,000,000 rows, id values   1 through 1000000
-- table_b    990,000 rows, a_id values 1 through 1000000 (less a_id MOD 100 = 0)

left-join-where

-- EXPLAIN
SELECT /*! SQL_NO_CACHE */ a.id
     , a.name_
  FROM table_a a
  LEFT
  JOIN table_b b
    ON b.a_id = a.id
 WHERE b.a_id IS NULL

not-in

-- EXPLAIN 
SELECT /*! SQL_NO_CACHE */ a.id
     , a.name_
  FROM table_a a
 WHERE a.id NOT IN (SELECT b.a_id FROM table_b b)

not-exists

-- EXPLAIN 
SELECT /*! SQL_NO_CACHE */ a.id
     , a.name_
  FROM table_a a
 WHERE NOT EXISTS 
       (SELECT 1
          FROM table_b b
         WHERE b.a_id = a.id)

Performance results (in seconds):

                  run 2  run 3  run 4  run 5  avg
                  -----  -----  -----  -----  -----
left-join-where   0.227  0.227  0.227  0.227  0.227 
not-in            0.233  0.233  0.234  0.233  0.233
not-exists        1.031  1.029  1.032  1.031  1.031

EXPLAIN output for the three queries:

left-join-where  
id  select_type        table type           possible_ key     key_len ref       rows Extra                                 
--  -----------        ----- -------------- --------- ------- ------- ------ ------- ------------------------------------
 1  SIMPLE             a     ALL                                             1000392                                       
 1  SIMPLE             b     ref            PRIMARY   PRIMARY 4       a.id         1 Using where; Using index; Not exists

not-in
id  select_type        table type           possible_ key     key_len ref       rows Extra        
--  ------------------ ----- -------------- --------- ------- ------- ------ ------- ------------------------------------
 1  PRIMARY            a     ALL                                             1000392 Using where  
 2  DEPENDENT SUBQUERY b     index_subquery PRIMARY   PRIMARY 4       func         1 Using index 

not-exists
id  select_type        table type           possible_ key     key_len ref       rows Extra        
--  ------------------ ----- ------         --------- ------- ------- ------ ------- ------------------------------------
 1  PRIMARY            a     ALL                                             1000392 Using where  
 2  DEPENDENT SUBQUERY b     ref            PRIMARY   PRIMARY 4       a.id         1 Using index

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM