在两个相似的表层次结构之间匹配一组子记录

Question

I have two similar table hierarchies: 我有两个类似的表层次结构：

Owner -> OwnerGroup -> Parent

and 和

Owner2 -> OwnerGroup2

I would like to determine if there is an exact match of Owners that exists in Owner2 based on a set of values. 我想根据一组值确定Owner2中是否存在完全匹配的Owners。 There are approximately a million rows in each Owner table. 每个所有者表中大约有一百万行。 Some OwnerGroups contain up to 100 Owners. 某些OwnerGroup最多包含100个所有者。

So basically if there is an OwnerGroup than contains Owners "Smith", "John" and "Smith, "Jane", I want to know the id of the OwnerGroup2s that are exact matches. 因此，基本上，如果有一个OwnerGroup而不包含所有者“ Smith”，“ John”和“ Smith，” Jane”，我想知道完全匹配的OwnerGroup2s的ID。

The first attempt at this was to generate a join per Owner (which required dynamic sql being generated in the application: 首次尝试是为每个所有者生成一个联接（这需要在应用程序中生成动态sql：

select og.id
from owner_group2 og
-- dynamic bit starts here
join owner2 o1 on
(og.id = o1.og_id) AND
(o1.given_names = 'JOHN' and o1.surname='SMITH')
-- dynamic bit ends here
join owner2 o2 on
(og.id = o2.og_id) AND
(o2.given_names = 'JANE' and o2.surname='SMITH');

This works fine until for small numbers of owners, but when we have to deal with the 100 Owners in a group scenario as this query plan means there 100 nested loops and it takes almost a minute to run. 直到少数所有者，此方法都可以正常工作，但是当我们必须在组方案中处理100个所有者时，因为此查询计划意味着存在100个嵌套循环，因此运行将近一分钟。

Another option I had was to use something around the intersect operator. 我的另一个选择是在intersect运算符周围使用某些东西。 Eg 例如

select * from ( 
select o.surname, o.given_names
from owner1 o1
join owner_group1 og1 on o1.og_id = og1.id 
where 
og1.parent_id = 1936233
)
intersect
select o.surname, o.given_names
from owner2 o2 
join owner_group2 og2 on og2.id = o2.og_id;

I'm not sure how to suck out the owner2.id in this scenario either - and it was still running in the 4-5 second range. 在这种情况下，我也不确定如何吸收owner2.id-并且它仍在4-5秒范围内运行。

I feel like I am missing something obvious - so please feel free to provide some better solutions! 我觉得自己缺少明显的东西-请随时提供一些更好的解决方案！

Answer 1

You're on the right track with intersect , you just need to go a bit further. 与intersect正确的轨道上，您只需要走得更远。 You need to join the results of it back to the owner_groups2 table to find the ids. 您需要将其结果重新添加到owner_groups2表中以找到ID。

You can use the listagg function to convert the groups into comma-separated lists of the names (note - requires 11g). 您可以使用listagg函数将各组转换为以逗号分隔的名称列表（注意-需要11g）。 You can then take the intersection of these name lists to find the matches and join this back to the list in owner_groups2 . 然后，您可以使用这些名称列表的交集来查找匹配项，并将其加入owner_groups2的列表中。

I've created a simplified example below, in it "Dave, Jill" is the group that is present in both tables. 我在下面创建了一个简化的示例，其中“ Dave，Jill”是两个表中都存在的组。

create table grps (id integer, name varchar2(100));
create table grps2 (id integer, name varchar2(100));

insert into grps values (1, 'Dave');
insert into grps values(1, 'Jill');

insert into grps values (2, 'Barry');
insert into grps values(2, 'Jane');

insert into grps2 values(3, 'Dave');
insert into grps2 values(3, 'Jill');

insert into grps2 values(4, 'Barry');

with grp1 as (
 SELECT id, listagg(name, ',') within group (order by name) n 
 FROM grps
 group by id
), grp2 as (
 SELECT id, listagg(name, ',') within group (order by name) n 
 FROM grps2
 group by id
)
SELECT * FROM grp2 
where  n in (
  -- find the duplicates
  select n from grp1 
  intersect
  select n from grp2
);

Note this will still require a full scan of owner_groups2 ; 注意，这仍然需要对owner_groups2进行完整扫描； I can't think of a way you can avoid this. 我想不出一种可以避免这种情况的方法。 So your query is likely to remain slow. 因此，您的查询可能会保持缓慢。

在两个相似的表层次结构之间匹配一组子记录

问题描述

1 个解决方案

解决方案1
0 2013-02-04 13:01:57

在两个相似的表层次结构之间匹配一组子记录

问题描述

1 个解决方案

解决方案1 0 2013-02-04 13:01:57

解决方案1
0 2013-02-04 13:01:57