[英]Get ID pairs between 2 tables with matching child records
I have 2 tables with the same structure. 我有2个具有相同结构的表。
FIELD 1 INT FIELD 2 VARCHAR(32) -- is a MD5 Hash
The query has to get matching FIELD 1 pairs from for records that have the exact combination of values for FIELD 2 in both TABLE 1 and TABLE 2. 对于表1和表2中具有FIELD 2值的确切组合的记录,查询必须从中获得匹配的FIELD 1对。
These tables are pretty large ( 1 million records between the two ) but are deduced down to an ID and a Hash. 这些表非常大(两个表之间有1百万条记录),但可以推导出为ID和哈希值。
Example data: 示例数据:
TABLE 1
表格1
1 A 1 B 2 A 2 D 2 E 3 G 3 H 4 E 4 D 4 C 5 E 5 D
TABLE 2
表2
8 A 8 B 9 E 9 D 9 C 10 F 11 G 11 H 12 B 12 D 13 A 13 B 14 E 14 AThe results of the query should be
\n8 18 1\n9 4
9 4\n11 3
11 3\n13 1
13 1\n
I have tried creating a concatenated string of FIELD 2 using a correlated sub-query and FOR XML PATH string trick I read on here but that is very slow. 我试图使用相关的子查询和FOR XML PATH字符串技巧创建一个FIELD 2的串联字符串,但我在这里很慢。
You can try following query also - 您也可以尝试以下查询-
SELECT t_2.Field_1, t_1.Field_1 --1
FROM table_1 t_1, table_2 t_2 --2
WHERE t_1.Field_2 = t_2.Field_2 --3
GROUP BY t_1.Field_1, t_2.Field_1 --4
HAVING COUNT(*) = (SELECT COUNT(*) --5
FROM Table_1 t_1_1 --6
WHERE t_1_1.Field_1 = t_1.Field_1) --7
AND COUNT(*) = (SELECT COUNT(*) --8
FROM Table_2 t_2_1 --9
WHERE t_2_1.Field_1 =t_2.Field_1) --10
Edit 编辑
First the requested set of result is the combination of Field1 from both the tables where respective Field2 is exactly same. 首先,请求的结果集是两个表中Field1完全相同的两个表的组合。
so for that you can use one method which I have posted above. 因此,您可以使用我上面发布的一种方法。
Here query will take the data from both the table based on field2 values (from line 1 to line 3) then it will group the data based on field1 from table1 and field1 from table2 (line 4) 在这里查询将基于field2值(从第1行到第3行)从两个表中获取数据,然后将基于table1的field1和table2的field1(第4行)对数据进行分组
till this step you will get the result having field1 from table1 and field2 from table2 where it exists (at least one) matching based on field2 from tables for respective field1 values. 直到这一步,您将获得具有table1的field1和table2的field2的结果,该结果存在(至少一个)基于各个field1值的表中的field2进行匹配。
after this you just need to filter the result for correct (exactly same values for field2 values for respective field1 column value). 之后,您只需要过滤结果就可以了(相应的field1列值的field2值完全相同)。 so that you can make condition on row count.
这样就可以使行数成为条件。
here my assumption is that you don't have multiple values for field1 and field2 combination in either tables
在这里我的假设是您在两个表中都没有field1和field2组合的多个值
means following rows will not be present - 表示以下行将不存在-
1 b 1 b 1 b 1 b
In any of the tables. 在任何表中。
if so, the rows count got for table1 and table2 for same field2 values should be match with the rows present in table1 for field1 and same rows only should present in tables2 for field2 value. 如果是这样,则针对相同field2值的table1和table2获得的行数应与针对field1的table1中存在的行相匹配,并且针对field2值仅应在table2中存在相同的行。
for this condition query has condition on count(*)
in having
clause (from line 5 to line 10). 对于此条件查询,在
having
子句中(从第5行到第10行count(*)
对count(*)
具有条件。
Let me try to explain this version of the query: 让我尝试解释此版本的查询:
select t1.field1 as t1field1, t2.field1 as t2field1
from (select t1.*,
count(*) over (partition by field1) as NumField2
from table1 t1
) t1 full outer join
(select t2.*,
count(*) over (partition by field1) as NumField2
from table2 t2
) t2
on t1.field2 = t2.field2
where t1.NumField2 = t2.NumField2
group by t1.Field1, t2.Field1
having count(t1.field2) = max(t1.NumField2) and
count(t2.field2) = max(t2.NumField2)
(which is here at SQLFiddle). (这是这里的SQLFiddle)。
The idea is to compare the following counts for each pair of field1
values. 想法是比较每对
field1
值的以下计数。
field2
values on each. field2
值的数量。 field2
values that they share. field2
值的数量。 All of these have to be equal. 所有这些必须相等。
Each subquery counts the number of values of field2
on each field1
value. 每个子查询在每个
field1
值上计算field2
值数。 For the first rows of your data, this produces: 对于数据的第一行,将产生:
1 A 2
1 B 2
2 A 3
2 D 3
2 E 3
. . .
And for the second table 对于第二张桌子
8 A 2
8 B 2
9 E 3
9 D 3
9 C 3
Next, the full outer join
is applied, requiring a match on both the count and the field2
value. 接下来,应用
full outer join
,要求计数和field2
值都匹配。 This multiplies the data, producing rows such as: 这将数据相乘,产生诸如以下的行:
1 A 2 8 A 2
1 B 2 8 B 2
2 A 3 NULL NULL NULL
2 D 3 9 D 3
2 E 3 9 E 3
NULL NULL NULL 9 C 3
And so on for all the possible combinations. 对于所有可能的组合,依此类推。 Note that the
NULL
s appear due to the full outer join
. 请注意,由于
full outer join
而出现NULL
。
Note that when you have a pair, such as 1 and 8 that match, there are no rows with NULL
values. 请注意,如果有一对匹配,例如1和8匹配,则没有行具有
NULL
值。 When you have a pair with the same counts but they don't match, then you have NULL
values. 如果您有一对计数相同但不匹配的对,则您将具有
NULL
值。 When you have a pair with different counts, they are filtered out by the where
clause. 当您有一对计数不同的对时,它们会被
where
子句过滤掉。
The filtering aggregation step applies these rules to get pairs that meet the first condition but not the second. 过滤聚合步骤应用这些规则来获得满足第一个条件但不满足第二个条件的对。
The having
essentially removes any pair that has NULL
values. 在
having
基本上除去具有任何一对NULL
值。 When you count()
a column, NULL
values are not included. 当您
count()
列时,不包括NULL
值。 In that case, the count()
on the column is fewer than the number of values expected ( NumField2
). 在这种情况下,列上的
count()
小于期望值的数量( NumField2
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.