简体   繁体   中英

T-SQL: compare 2 columns with unknown values

I'm joining a table to itself in order to find people in my table in the same family with different last names. The only issue is there are instances where one last name might be Jones, and for another record the column might be Jones Jr.

These are technically the same last name so they don't fit my requirements. I need to eliminate Jones Jr. from my results.

The complicating factor is it could also be something like Smith-Jones, so I'd need to remove this record too. Since I don't know where the difference will be I would like to be able to add a condition to my query saying that no more than 4 (or some arbitrary number) characters of each name can match.

Here's my query:

SELECT [fields] 
FROM [table] a 
INNER JOIN [table] b ON a.[family_id] = b.[family_id]
WHERE a.[last_name] <> b.[last_name]

Any ideas?

Use wildcards for the comparison. The following might work for what you want:

SELECT [fields]
from [table] a INNER JOIN
     [table] b
     ON a.[family_id] = b.[family_id]
WHERE a.last_name not like '%' + b.last_name + '%' and
      b.last_name not like '%' + a.last_name + '%';

Of course, "Johns" and "Johnson" and "Martin" and "Martinez" will also fail to match. I don't know if that is an issue.

You could use DIFFERENCE. The value returned is the number of characters in the SOUNDEX values that are the same. The return value ranges from 0 through 4: 0 indicates weak or no similarity, and 4 indicates strong similarity or the same values. This would be a nice solution to your requirement: find people in my table in the same family with different last names

SELECT [fields] from [table] a 
INNER JOIN [table] b
ON a.[family_id] = b.[family_id]
WHERE DIFFERENCE(a.[last_name], b.[last_name]) < 4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM