简体   繁体   中英

SQL: Find non-unique records in one column that are also non-unique in another column

I have a table of places and their regions, but sometimes there is more than one place with the same name in the same region. Real life examples: There are five Springfields in Wisconsin . Nova Scotia, where I live, has three Sandy Coves . I need to find these sorts of records and either disambiguate them (by adding their county or equivalent, for example) or just delete junk/overlapping ones. That will take some work, but first I'd like to see how much work.

PlaceName:    RegionName:
Summerville   Big State
Summerville   Bigger State (OK, different states, no problem...)
Summerville   Little State <- 
Summerville   Little State <- I need to deal with these

This query, based on an answer from another question , gets me all of the places with the same name:

SELECT * FROM Places WHERE PlaceName IN
  (SELECT PlaceName FROM Places GROUP BY PlaceName HAVING COUNT(*) > 1);

This is a good start, but I want to skip over the names that don't occur more than once in the same state and just get straight to the problem cases. To put it generically, I'd like to find non-unique records in one column and from there get the ones that are non-unique in another column.

(FWIW, I'm using MariaDB, which is mostly compatible with MySQL.)

Basically you want to

  • first count how often each (place/region) tuple exists
  • then filter only those, that appear more than once

So let's just do this

SELECT
  PlaceName, RegionName, Count(*) AS num
FROM Places
  GROUP BY CONCAT(PlaceName,':::',RegionName)
HAVING COUNT(*)>1

One way to do this is with a join to the aggregated list. You need to aggregate by both region and place to get the list you want:

SELECT p.*, rp.cnt
FROM Places p join
     (SELECT RegionName, PlaceName, COUNT(*) as cnt
      FROM Places
      GROUP BY  RegionName, PlaceName
      HAVING COUNT(*) > 1
     ) rp
     on p.RegionName = rp.RegionName and p.PlaceName = rp.PlaceName;

You don't mention the actual database you are using. There are other ways to phrase this, some dependent on the database.

I think you can just concatenate columns:

SELECT * FROM Places WHERE PlaceName + RegionName IN
(SELECT PlaceName + RegionName FROM Places GROUP BY PlaceName + RegionName HAVING COUNT(*) > 1);

If I'm wrong, I'm sure other StackOverflowers will let me know!! :D

I am not sure but it seems to be just simple group by two fields

 select PlaceName , RegionName
 from Places
 group by PlaceName , RegionName
 having count(*) >1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM