在SQL中以一对多关系查找重复项

Question

问题

我有2张桌子：

Table tTag
idTag int
otherColumns

和

Table tTagWord
idTagWord int
idTag int
idWord int
position int

例如：

在此处输入图片说明

因此，每个idTag将具有多个idTagWord（未知数字），位置也很重要。 我试图找到最佳的性能，以找到重复项。

对于两个不同的idTag，重复项将以相同的顺序（位置）具有相同的idWord。

我尝试过的

SELECT GROUP_CONCAT(DISTINCT tab.idTag SEPARATOR ',') INTO @idTagSet
FROM (  SELECT idTag,GROUP_CONCAT(idWord order by position ASC SEPARATOR ' ') AS Tag
        FROM tTagWord
        GROUP BY idTag) AS tab
INNER JOIN (SELECT idTag,GROUP_CONCAT(idWord order by position ASC SEPARATOR ' ') AS Tag
            FROM tTagWord
            GROUP BY idTag) AS tab2 ON tab.Tag = tab2.Tag
WHERE tab.idTag <> tab2.idTag;

上一个查询返回一组重复的idTag，因此可以正常工作。 但是性能太差了。 有了15万个idTag，它已经花费了几分钟，并且表很快就会有数百万个idTag。

我也尝试过这样的答案

select idTag, GROUP_CONCAT(idWord order by position ASC SEPARATOR '-') AS idWordSet
from tTagWord
group by idTag
Having COUNT(idWordSet) > 1;

但是我似乎找不到办法。 任何想法？

Answer 1

尝试两个group by s怎么样？

SELECT words, count(*), group_concat(idtag) as tags
FROM (SELECT idTag, GROUP_CONCAT(idWord order by position ASC SEPARATOR ' ') AS words
      FROM tTagWord
      GROUP BY idTag
     ) t
GROUP BY words
HAVING count(*) > 1;

Answer 2

这种查询有时称为关系除法， https： //www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-有很多方法关系分/

一个例子是：

select
    t1.idTag as tag1,
    t2.IdTag as tag2
from
    tTagWord t1
        inner join
    tTagWord t2
        on t1.idWord = t2.idWord and
           t1.position = t2.position and
           t1.idTag < t2.idTag
group by
    t1.idTag,
    t2.idTag
having
    count(*) = (
        select
            count(*)
        from
            tTagWord t3
        where
            t3.idTag = t1.idTag
    ) and
    count(*) = (
        select
            count(*)
        from
            tTagWord t4
        where
            t4.idTag = t2.idTag
    );

这是一个例子。 我也把戈登的查询放在那儿。 它们可能具有不同的性能特征。

在SQL中以一对多关系查找重复项

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-09-16 21:04:04

解决方案2
2 2014-09-16 21:18:12

在SQL中以一对多关系查找重复项

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-09-16 21:04:04

解决方案2 2 2014-09-16 21:18:12

解决方案1
3 已采纳 2014-09-16 21:04:04

解决方案2
2 2014-09-16 21:18:12