简体   繁体   English

SQL连接两个表并根据另一列及其累积计数映射公用值

[英]Sql join two tables and map common values based on another column and their cumulative count

I have the following two tables, 我有以下两个表,

Table 1 : 表1

id    document
--------------
    A2      B200
    A2      B6
    A2      B2
    A2      B3
    A3      B2
    A3      B400
    A5      B100
    A5      B500 
    A6      B6
    A7      B200
    A8      B6
    A8      B2
    A8      B3    
    A8      C1

table 2: 表2:

id    name
--------------
A1      Jack
A2      Martin
A3      Jack
A4      Thomas
A5      Jack
A6      Thomas
A7      Thomas
A8      John
A9      John
A10     Kate

My filter is the following that should compare the document column and filter based on this list: 以下是我的过滤器,应该比较document列和基于此列表的过滤器:

WHERE table1.document IN (B2,B400,B100,B500,B200,B6,B2,B3)

The result should be like: 结果应为:

name1   name2   freq
--------------------
Jack    John    1
Martin  Jack    1
Martin  Thomas  2
Martin  John    3
Thomas  John    1

Some explanations: 一些解释:

We need to build a results table that maps the name s that have document s in common and its frequency. 我们需要构建一个结果表,以映射具有共同document name s及其频率。 First we need to filter the document list using the WHERE IN statement to get the list of documents we want to map. 首先,我们需要使用WHERE IN语句过滤文档列表,以获取要映射的文档列表。

Then we get the list of documents that the count is more than one, because that document is shared between at least two ids. 然后,我们得到计数大于一的文档列表,因为该文档在至少两个id之间共享。

Then we look up the name s of those ids in table2 and put them in results table and the count of the documents they had in common.Some names have multiple ids, so if we hit on those, then we add to the count. 然后我们在表2中查找这些ID的name s,并将它们放在结果表中,并将它们放在共同的文档计数中。某些名称具有多个ID,因此如果我们点击这些ID,则将其添加到计数中。

For example, document B6 is assigned to ids of A2 and A6, so they have this document in common, we create an entry in the results table, putting their corresponding names as name1 and name2 (order doesn't matter) and give it frequency of 1. But if we look further, we see that document B200 is shared by A2 and A7, when we look up the names of these two ids, we see that we already have an entry with those names, because they corresond to Martin and Thomas as in the previous one, so we add to their count, so it becomes 2. 例如,文档B6被分配给A2和A6的ID,因此它们具有该文档的共同点,我们在结果表中创建一个条目,将其对应的名称分别命名为name1和name2(顺序无关紧要),并为其赋予频率of1。但是如果进一步看,我们会发现A2和A7共享了文档B200,当我们查找这两个ID的名称时,我们已经发现我们已经有了这些名称的条目,因为它们对应于Martin和与上一个一样,托马斯(Thomas),因此我们增加了他们的数量,因此变成2。

Another example is that of the documents B6, B2, B3 that are shared by A2 and A8, Martin and John, so we create an entry for these two, and the count will be 3. 另一个示例是A2和A8,Martin和John共享的文档B6,B2,B3,因此我们为这两个文档创建一个条目,计数为3。

Here is a demo data. 这是一个演示数据。

You have to do the joins two times and then group by both names: 您必须进行两次连接,然后按两个名称分组:

SELECT t2a.name, t2b.name, COUNT(*)
FROM Table1 t1a
INNER JOIN Table2 t2a ON t2a.id = t1a.id
INNER JOIN Table1 t1b ON t1b.document = t1a.document
INNER JOIN Table2 t2b ON t2b.id = t1b.id AND t2a.name < t2b.name
GROUP BY t2a.name, t2b.name

Do like this 像这样

CREATE TABLE Table1
    (id varchar(10), document varchar(10))

INSERT INTO Table1
VALUES
    ('A2', 'B200'),
    ('A2', 'B6'),
    ('A2', 'B2'),
    ('A2', 'B3'),
    ('A3', 'B2'),
    ('A3', 'B400'),
    ('A5', 'B100'),
    ('A5', 'B500'), 
    ('A6', 'B6'),
    ('A7', 'B200'),
    ('A8', 'B6'),
    ('A8', 'B2'),
    ('A8', 'B3'),    
    ('A8', 'C1')

CREATE TABLE Table2
    (id varchar(3), name varchar(10))

INSERT INTO Table2
VALUES
    ('A1', 'Jack'),
    ('A2', 'Martin'),
    ('A3', 'Jack'),
    ('A4', 'Thomas'),
    ('A5', 'Jack'),
    ('A6', 'Thomas'),
    ('A7', 'Thomas'),
    ('A8', 'John'),
    ('A9', 'John'),
    ('A10','Kate')


;WITH docs AS (

    SELECT id, document FROM Table1
    WHERE table1.document IN ('B2','B400','B100','B500','B200','B6','B2','B3')
)
, user_docs AS (
    SELECT t2.id, t2.name, docs.document FROM docs
    INNER JOIN Table2 t2 ON t2.id = docs.id
)
, freq AS (
    SELECT ud.name, ud1.name name1, count(*) freq FROM user_docs ud
    INNER JOIN user_docs ud1 ON ud1.document = ud.document AND ud1.name != ud.name AND ud1.id > ud.id
    GROUP BY ud.name, ud1.name
)
SELECT * FROM freq


DROP TABLE Table1, Table2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM