[英]Sql join two tables and map common values based on another column and their cumulative count
I have the following two tables, 我有以下两个表,
Table 1 : 表1 :
id document
--------------
A2 B200
A2 B6
A2 B2
A2 B3
A3 B2
A3 B400
A5 B100
A5 B500
A6 B6
A7 B200
A8 B6
A8 B2
A8 B3
A8 C1
table 2: 表2:
id name
--------------
A1 Jack
A2 Martin
A3 Jack
A4 Thomas
A5 Jack
A6 Thomas
A7 Thomas
A8 John
A9 John
A10 Kate
My filter is the following that should compare the document
column and filter based on this list: 以下是我的过滤器,应该比较
document
列和基于此列表的过滤器:
WHERE table1.document IN (B2,B400,B100,B500,B200,B6,B2,B3)
The result should be like: 结果应为:
name1 name2 freq
--------------------
Jack John 1
Martin Jack 1
Martin Thomas 2
Martin John 3
Thomas John 1
Some explanations: 一些解释:
We need to build a results table that maps the name
s that have document
s in common and its frequency. 我们需要构建一个结果表,以映射具有共同
document
name
s及其频率。 First we need to filter the document list using the WHERE IN
statement to get the list of documents we want to map. 首先,我们需要使用
WHERE IN
语句过滤文档列表,以获取要映射的文档列表。
Then we get the list of documents that the count is more than one, because that document is shared between at least two ids. 然后,我们得到计数大于一的文档列表,因为该文档在至少两个id之间共享。
Then we look up the name
s of those ids in table2 and put them in results table and the count of the documents they had in common.Some names have multiple ids, so if we hit on those, then we add to the count. 然后我们在表2中查找这些ID的
name
s,并将它们放在结果表中,并将它们放在共同的文档计数中。某些名称具有多个ID,因此如果我们点击这些ID,则将其添加到计数中。
For example, document B6 is assigned to ids of A2 and A6, so they have this document in common, we create an entry in the results table, putting their corresponding names as name1 and name2 (order doesn't matter) and give it frequency of 1. But if we look further, we see that document B200 is shared by A2 and A7, when we look up the names of these two ids, we see that we already have an entry with those names, because they corresond to Martin and Thomas as in the previous one, so we add to their count, so it becomes 2. 例如,文档B6被分配给A2和A6的ID,因此它们具有该文档的共同点,我们在结果表中创建一个条目,将其对应的名称分别命名为name1和name2(顺序无关紧要),并为其赋予频率of1。但是如果进一步看,我们会发现A2和A7共享了文档B200,当我们查找这两个ID的名称时,我们已经发现我们已经有了这些名称的条目,因为它们对应于Martin和与上一个一样,托马斯(Thomas),因此我们增加了他们的数量,因此变成2。
Another example is that of the documents B6, B2, B3 that are shared by A2 and A8, Martin and John, so we create an entry for these two, and the count will be 3. 另一个示例是A2和A8,Martin和John共享的文档B6,B2,B3,因此我们为这两个文档创建一个条目,计数为3。
You have to do the joins two times and then group by both names: 您必须进行两次连接,然后按两个名称分组:
SELECT t2a.name, t2b.name, COUNT(*)
FROM Table1 t1a
INNER JOIN Table2 t2a ON t2a.id = t1a.id
INNER JOIN Table1 t1b ON t1b.document = t1a.document
INNER JOIN Table2 t2b ON t2b.id = t1b.id AND t2a.name < t2b.name
GROUP BY t2a.name, t2b.name
Do like this 像这样
CREATE TABLE Table1
(id varchar(10), document varchar(10))
INSERT INTO Table1
VALUES
('A2', 'B200'),
('A2', 'B6'),
('A2', 'B2'),
('A2', 'B3'),
('A3', 'B2'),
('A3', 'B400'),
('A5', 'B100'),
('A5', 'B500'),
('A6', 'B6'),
('A7', 'B200'),
('A8', 'B6'),
('A8', 'B2'),
('A8', 'B3'),
('A8', 'C1')
CREATE TABLE Table2
(id varchar(3), name varchar(10))
INSERT INTO Table2
VALUES
('A1', 'Jack'),
('A2', 'Martin'),
('A3', 'Jack'),
('A4', 'Thomas'),
('A5', 'Jack'),
('A6', 'Thomas'),
('A7', 'Thomas'),
('A8', 'John'),
('A9', 'John'),
('A10','Kate')
;WITH docs AS (
SELECT id, document FROM Table1
WHERE table1.document IN ('B2','B400','B100','B500','B200','B6','B2','B3')
)
, user_docs AS (
SELECT t2.id, t2.name, docs.document FROM docs
INNER JOIN Table2 t2 ON t2.id = docs.id
)
, freq AS (
SELECT ud.name, ud1.name name1, count(*) freq FROM user_docs ud
INNER JOIN user_docs ud1 ON ud1.document = ud.document AND ud1.name != ud.name AND ud1.id > ud.id
GROUP BY ud.name, ud1.name
)
SELECT * FROM freq
DROP TABLE Table1, Table2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.