[英]Removing duplicates from result of multiple join on tables with different columns in MySQL
I am trying to make one statement to pull data from 3 related tables (as in they all share a common string index). 我正在尝试发表一条语句来从3个相关表中提取数据(因为它们都共享一个公共的字符串索引)。 I am having trouble preventing MySQL from returning the product of two of the tables, making the result set much larger than I want it.
我无法阻止MySQL返回两个表的乘积,从而使结果集比我想要的大得多。 Each table has a different number of columns, and I would prefer to not use UNION anyway, because the data in each table is separate.
每个表具有不同数量的列,并且我还是不希望使用UNION,因为每个表中的数据都是独立的。
Here is an example: 这是一个例子:
Table X is the main table and has fields A B. 表X是主表,并且具有字段AB。
Table Y has fields AC D. 表Y具有字段ACD。
Table Z has fields AEF G. 表Z具有字段AEFG。
- --
My ideal result would have the form: 我理想的结果将具有以下形式:
A1 B1 C1 D1 E1 F1 G1
A1 B2 C2 D2 00 00 00
A2 B3 C3 D3 E2 F2 G2
A2 B4 00 00 E3 F3 G3
etc... 等等...
- --
Here is the simplest SQL I have tried that shows my problem (that is, it returns the product of Y * Z indexed by data from A: 这是我尝试过的最简单的SQL,它显示了我的问题(即,它返回由A的数据索引的Y * Z的乘积:
SELECT DISTINCT *
FROM X
LEFT JOIN Y USING (A)
LEFT JOIN Z USING (A)
- --
I have tried adding a group by clause to fields on Y and Z. But, if I only group by one column, it only returns the first result matched with each unique value in that column (ie: A1 C1 E1, A1 C2 E1, A1 C3 E1). 我尝试过将group by子句添加到Y和Z的字段中。但是,如果我仅按一个列进行分组,则它只会返回与该列中每个唯一值匹配的第一个结果(即:A1 C1 E1,A1 C2 E1, A1 C3 E1)。 And if I group by two columns it returns the product of the two tables again.
如果我按两列分组,它将再次返回两个表的乘积。
I've also tried doing multiple select statements in the query, then joining the resulting tables, but I received the product of the tables as output again. 我也尝试过在查询中执行多个select语句,然后加入结果表,但是我又收到了表的乘积作为输出。
Basically I want to merge the results of three select statements into a single result, without it giving me all combinations of the data. 基本上,我想将三个select语句的结果合并为一个结果,而又不给我所有数据组合。 If I need to, I can resort to doing multiple queries.
如果需要,我可以求助于多个查询。 However, since they all contain a common index, I feel there should be a way to do it in one query that I am missing.
但是,由于它们都包含一个公共索引,因此我认为应该有一种方法可以在我遗漏的一个查询中执行此操作。
Thanks for any help. 谢谢你的帮助。
I don't know if I understand your problem, but why are you using a LEFT JOIN? 我不知道我是否理解您的问题,但是为什么要使用LEFT JOIN? The story sounds more like an INNER JOIN.
故事听起来更像是“内心的加入”。 Nothing here calls for a UNION.
这里什么都不需要UNION。
[Edit] OK, I think I see what you want now. [编辑]好的,我想我现在明白了。 I've never tried what I am about to suggest, and what's more, some DBs don't support it (yet), but I think you want a windowing function.
我从来没有尝试过我将要提出的建议,而且,一些数据库尚不支持(但),但我认为您需要窗口功能。
WITH Y2 AS (SELECT Y.*, ROW_NUMBER() OVER (PARTITION BY A) AS YROW FROM Y),
Z2 AS (SELECT Z.*, ROW_NUMBER() OVER (PARTITION BY A) AS ZROW FROM Z)
SELECT COALESCE(Y2.A,Z2.A) AS A, Y2.C, Y2.D, Z2.E, Z2.F, Z2.G
FROM Y2 FULL OUTER JOIN Z2 ON Y2.A=Z2.A AND YROW=ZROW;
The idea is to print the list in as few rows as possible, right? 这个想法是将列表打印在尽可能少的行中,对吧? So if A1 has 10 entries in Y and 7 in Z, then we get 10 rows with 3 having NULLs for the Z fields.
因此,如果A1在Y中有10个条目,在Z中有7个条目,那么我们得到10行,其中3个Z字段具有NULL。 This works in Postgres.
这适用于Postgres。 I do not believe this syntax is available in MySQL.
我不认为该语法在MySQL中可用。
Y: Y:
a | d | c
---+---+----
1 | 1 | -1
1 | 2 | -1
2 | 0 | -1
Z: Z:
a | f | g | e
---+---+---+---
1 | 9 | 9 | 0
2 | 1 | 1 | 0
3 | 0 | 1 | 0
Output of statement above: 以上语句的输出:
a | c | d | e | f | g
---+----+---+---+---+---
1 | -1 | 1 | 0 | 9 | 9
1 | -1 | 2 | | |
2 | -1 | 0 | 0 | 1 | 1
3 | | | 0 | 0 | 1
Yep, UNION
is not the answer. 是的,
UNION
不是答案。
I'm thinking you want: 我在想你要:
SELECT *
FROM x
JOIN y ON x.a = y.a
JOIN z ON x.a = z.a
GROUB BY x.a;
I found a new way editing this post and this can be used to merg two table according to unique ids. 我发现了一种编辑此帖子的新方法,该方法可用于根据唯一ID合并两个表。
Try this: 尝试这个:
create table y
(
a int,
d int,
c int
)
create table z
(
a int,
f int,
g int,
e int
)
go
insert into y values(1,1,-1)
insert into y values(1,2,-1)
insert into y values(2,0,-1)
insert into z values(1,9,9,0)
insert into z values(2,1,1,0)
insert into z values(3,0,1,0)
go
select * from y
select * from z
WITH Y2 AS (SELECT Y.*, ROW_NUMBER() OVER (ORDER BY A) AS YROW FROM Y where A = 3),
Z2 AS (SELECT Z.*, ROW_NUMBER() OVER (ORDER BY A) AS ZROW FROM Z where A = 3)
SELECT COALESCE(Y2.A,Z2.A) AS A, Y2.C, Y2.D, Z2.E, Z2.F, Z2.G
FROM Y2 FULL OUTER JOIN Z2 ON Y2.A=Z2.A AND YROW=ZROW;
PostgreSQL is always the right answer to most MySQL issues, but your problem could have been solved this way : PostgreSQL始终是大多数MySQL问题的正确答案,但是您的问题可以通过以下方式解决:
The issue you experienced was that you had two left joins, ie 您遇到的问题是您有两个左联接,即
A left join X left join Y which inevitably gives you A x X x Y where you wanted (AxX)x(AxY) 左连接X左连接Y不可避免地给您A x X x Y(AxX)x(AxY)
A simple solution could be : 一个简单的解决方案可能是:
select x.A,x.B,x.C,x.D,y.E,y.F,y.G from (SELECT A.A,A.B,X.C,X.D FROM A LEFT JOIN X ON A.A=X.A) x INNER JOIN (SELECT A.A,Y.E,Y.F,Y.G FROM A LEFT JOIN Y ON A.A=Y.A) y ON x.A=y.A
For the test details : 有关测试详细信息:
CREATE TABLE A (A varchar(3),B varchar(3));
CREATE TABLE X (A varchar(3),C varchar(3), D varchar(3));
CREATE TABLE Y (A varchar(3),E varchar(3), F varchar(3), G varchar(3));
INSERT INTO A(A,B) VALUES ('A1','B1'), ('A2','B2'), ('A3','B3'), ('A4','B4');
INSERT INTO X(A,C,D) VALUES ('A1','C1','D1'), ('A3','C3','D3'), ('A4','C4','D4');
INSERT INTO Y(A,E,F,G) VALUES ('A1','E1','F1','G1'), ('A2','E2','F2','G2'), ('A4','E4','F4','G4');
select x.A,x.B,x.C,x.D,y.E,y.F,y.G from (SELECT A.A,A.B,X.C,X.D FROM A LEFT JOIN X ON A.A=X.A) x INNER JOIN (SELECT A.A,Y.E,Y.F,Y.G FROM A LEFT JOIN Y ON A.A=Y.A) y ON x.A=y.A
As a summary, yes MySQL has many many many issues, but this is not one of them - most of the issues concern more advanced stuff. 总而言之,是的,MySQL有很多很多问题,但这不是其中之一-大多数问题都涉及更高级的内容。
If I understand correctly, table X
has a 1:n
relationship with both tables Y
and Z
. 如果我理解正确的话,表
X
有一个1:n
与这两个表的关系Y
和Z
。 So, the behaviour you see is expected. 因此,您看到的行为是预期的。 The result you get is a kind of Cross Product.
您得到的结果是一种交叉产品。
If X
has Person data, Y
has Address data for those persons and Z
has Phone data for those persons, then it's natural your query to show all combinations of addresses and phones for every person. 如果
X
具有人员数据, Y
具有这些人员的地址数据,而Z
具有这些人员的电话数据,那么很自然地您的查询将显示每个人的地址和电话的所有组合。 If someone has 3 addresses and 4 phones in your tables, then the query shows 12 rows in the result. 如果某人在表中有3个地址和4个电话,则查询将在结果中显示12行。
You could avoid it by either using a UNION
query or issuing two queries: 您可以通过使用
UNION
查询或发出两个查询来避免这种情况:
SELECT X.*
, Y.*
FROM X
LEFT JOIN Y
ON Y.A = X.A
and: 和:
SELECT X.*
, Z.*
FROM X
LEFT JOIN Z
ON Z.A = X.A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.