从一个表与另一个表中查找匹配的人（MS SQL Server）

Question

I have two tables:我有两张桌子：

table "Person"表“人”

ID          FirstName  LastName
----------- ---------- ----------
1           Janez      Novak
2           Matija     Špacapan
3           Francka    Joras

Table "UserList"表“用户列表”

ID    FullName
----- --------------------
1     Andrej Novak
2     Novak Peter Janez
3     Jana Novak
4     Andrej Kosir
5     Jan Balon
6     Francka Joras
7     France Joras

As a result, the query must return those IDs from both tables, that FirstName and Lastname from table Person exist in table UserList.因此，查询必须从两个表中返回那些 ID，即来自表 Person 的 FirstName 和 Lastname 存在于表 UserList 中。 The name and Lastname must be precisely the same.姓名和姓氏必须完全相同。 FullName in table UserList can include the middle name - which should be "ignored".表 UserList 中的 FullName 可以包含中间名 - 应该“忽略”。

Match: Janez Novak = Janez Novak OR Novak Janez OR Janez Peter Novak比赛：Janez Novak = Janez Novak OR Novak Janez OR Janez Peter Novak

Not a match: Janez Novak <> Janeza Novak OR Jjanez Novak不匹配：Janez Novak <> Janeza Novak 或 Jjanez Novak

Wanted results:想要的结果：

ID   FirstName  LastName  ID   WholeName
---- ---------- --------- ---- -------------------
1    Janez      Novak     2    Novak Peter Janez
3    Francka    Joras     6    Francka Joras

This is my query:这是我的查询：

SELECT 
    A.ID
    ,A.FirstName
    ,A.LastName
    ,B.ID
    ,B.WholeName
FROM    
    dbo.UserList B
    cross join dbo.Person A 
WHERE   
    (                                                
    CHARINDEX('"'+A.FirstName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0
     AND CHARINDEX('"'+A.LastName+'"', '"'+Replace(B.WholeName,' ','"')+'"') > 0 
    )

The query works OK when there are not many records in the tables.当表中的记录不多时，查询工作正常。

But my tables have: "Person" -> 400k and "UserList" -> 14k records.但是我的表有：“Person”-> 400k 和“UserList”-> 14k 记录。

Is my approach to finding a solution OK, or is there any other more efficient way to do that?我找到解决方案的方法可以吗，还是有其他更有效的方法可以做到这一点？ Thank you.谢谢你。

BR BR

Answer 1

Your schema is broken :p您的架构已损坏：p

There are various heuristis for doing the matching, but I expect you'll be able to find counterexamples to break whatever you try.进行匹配有多种启发式方法，但我希望您能够找到反例来打破您尝试的任何方法。 For example what about the four people: Peter Smith, Pete Smith, Peter Smithson, and Pete Smithson?例如，四个人：Peter Smith、Pete Smith、Peter Smithson 和 Pete Smithson 呢？

Here's a %LIKE% approach, which I'd expect to be slow.这是一种%LIKE%方法，我预计它会很慢。

SELECT p.ID, p.FirstName, p.LastName, u.ID, u.FullName,
    CASE WHEN COUNT(*) OVER (PARTITION BY p.ID) > 1 THEN 0 ELSE 1 END AS MatchIsUnique
FROM Person p
    INNER JOIN UserList u
        ON u.FullName LIKE p.FirstName + '%'
        AND u.LastName LIKE '%' + p.LastName

Here's a string manipulation approach based on the assumption that the space character is the delimiter.这是一种基于空格字符是分隔符的假设的字符串操作方法。

SELECT p.ID, p.FirstName, p.LastName, u.ID, u.FullName,
    CASE WHEN COUNT(*) OVER (PARTITION BY p.ID) > 1 THEN 0 ELSE 1 END AS MatchIsUnique
FROM Person p
    INNER JOIN UserList u
        ON p.FirstName = SUBSTRING(@FullName, 0, CHARINDEX(' ', @Fullname))
        AND p.LastName = SUBSTRING(@FullName, LEN(@FullName) - CHARINDEX(' ', REVERSE(@Fullname))+1, CHARINDEX(' ', REVERSE(@Fullname)))

Probably also quite slow.可能也很慢。 Maybe you could speed it up by adding也许您可以通过添加来加快速度

SUBSTRING(@FullName, 0, CHARINDEX(' ', @Fullname)) and SUBSTRING(@FullName, 0, CHARINDEX(' ', @Fullname))和
SUBSTRING(@FullName, LEN(@FullName) - CHARINDEX(' ', REVERSE(@Fullname))+1, CHARINDEX(' ', REVERSE(@Fullname)))

as computed columns and indexing them.作为计算列并对它们进行索引。

Answer 2

Create tables创建表

create table persons (
  id int IDENTITY(1,1) PRIMARY KEY,
  FirstName nvarchar(32) NOT NULL,
  LastName nvarchar(32) NOT NULL
);

create table users (
  id int IDENTITY(1,1) PRIMARY KEY,
  FullName nvarchar(32) NOT NULL
);

Sample data样本数据

INSERT INTO persons (FirstName, LastName)
values
('Janez','Novak'),
('Matija','Špacapan'),
('Francka','Joras');

INSERT INTO users (FullName)
VALUES
('Andrej Novak'),
('Novak Peter Janez'),
('Jana Novak'),
('Andrej Kosir'),
('Jan Balon'),
('Francka Joras'),
('France Joras'),

/* --EDIT: added sample data for wildcard testing-- */
('Franckas Joras'), -- added 's' after firstname
('Francka AJoras'), -- added 'A' before lastname
('Franckas AJoras'), -- both above
('Francka Jr. Joras'), -- added just midname
('Franckas Jr. Joras'); -- added 's' before firstname & added midname as well

Query (matching names)查询（匹配名称）

SELECT p.id, p.FirstName, p.LastName, u.id as user_id, u.FullName
FROM persons p, users u
WHERE
  -- EDIT
  /* changed wildcards (added spaces on both sides)
  + added 2 more conditions without wildcards */
  u.FullName LIKE CONCAT(p.FirstName, ' % ', p.LastName)
  OR
  u.FullName LIKE CONCAT(p.LastName, ' % ', p.FirstName)
  OR
  u.FullName LIKE CONCAT(p.FirstName, ' ', p.LastName)
  OR
  u.FullName LIKE CONCAT(p.LastName, ' ', p.FirstName)

Output输出

EDIT: output with new sample data (for wildcard testing)编辑：输出新样本数据（用于通配符测试）

Running example SQL Fiddle运行示例SQL Fiddle

Above example link is of MySQL & the code is working fine on SQL server上面的示例链接是 MySQL 的，代码在 SQL 服务器上运行良好

Answer 3

One method you could try is to split the full names into rows and then compare, selecting only those where both first and last name match:您可以尝试的一种方法是将全名分成几行，然后进行比较，只选择名字和姓氏都匹配的那些：

select Max(m.id) Id, max(m.firstname) FirstName, Max(m.lastname) LastName, 
  u.id, Max(u.fullname) FullName
from userlist u
cross apply String_Split(fullname,' ')
cross apply (
    select *
    from person p
    where p.firstname = value or p.lastname = value
)m
group by u.id 
having Count(*)=2;

Output:输出：

从一个表与另一个表中查找匹配的人（MS SQL Server）

问题描述

3 个解决方案

解决方案1
0 2022-05-23 13:17:08

解决方案2
0 已采纳 2022-05-23 13:22:23

Running example SQL Fiddle运行示例SQL Fiddle

解决方案3
0 2022-05-23 14:54:43

从一个表与另一个表中查找匹配的人（MS SQL Server）

问题描述

3 个解决方案

解决方案1 0 2022-05-23 13:17:08

解决方案2 0 已采纳 2022-05-23 13:22:23

Running example SQL Fiddle运行示例SQL Fiddle

解决方案3 0 2022-05-23 14:54:43

解决方案1
0 2022-05-23 13:17:08

解决方案2
0 已采纳 2022-05-23 13:22:23

解决方案3
0 2022-05-23 14:54:43