简体   繁体   中英

finding distinct pairs in sql

I was trying to learn non-equi joins when I encountered this problem. I have a table pops:

+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| country    | varchar(100) | YES  |     | NULL    |       |
| continent  | varchar(100) | YES  |     | NULL    |       |
| population | bigint(20)   | YES  |     | NULL    |       |

I was trying to find countries with their population in the vicinity of say, 100.

   select distinct 
       p1.country, 
       p2.country, 
       p1.population, 
       p2.population 
   from pops p1 
       inner join pops p2 
           on p1.population between p2.population - 100 and p2.population + 100 
           and p1.country <> p2.country  
   where p2.country <> p1.country

output I got was:

+------------+------------+------------+------------+
| country    | country    | population | population |
+------------+------------+------------+------------+
| pakistan   | india      |      99988 |      99999 |
| china      | india      |      99990 |      99999 |
| bangladesh | japan      |        999 |        999 |
| india      | pakistan   |      99999 |      99988 |
| china      | pakistan   |      99990 |      99988 |
| japan      | bangladesh |        999 |        999 |
| india      | china      |      99999 |      99990 |
| pakistan   | china      |      99988 |      99990 |
+------------+------------+------------+------------+

as we can see, I am getting pairs of (india, pakistan) as well as (pakistan, india), which is data-wise the same thing. Is it possible to eliminate one of the records from the pair?

You could decide to always have the lexographically first (or last, for argument's sake) country on the p1 side - use < (or > ) instead of <> . Also, note that your where clause is redundant, since you already have this condition in the on clause of the join :

select     p1.country, 
           p2.country, 
           p1.population, 
           p2.population 
from       pops p1 
inner join pops p2 
           on p1.population between p2.population - 100 and p2.population + 100 and
               p1.country < p2.country  
-- Here ------------------^

just change the join condition of p1.country <> p2.country to p1.country < p2.country

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM