简体   繁体   English

在SQL中使用DISTINCT内部联接

[英]Using DISTINCT inner join in SQL

I have three tables, A, B, C, where A is many to one B, and B is many to one C. I'd like a list of all C's in A. 我有三个表,A,B,C,其中A对B的个数很多,而B对C的个数很多。我想要一个A中所有C的列表。

My tables are something like this: A[id, valueA, lookupB], B[id, valueB, lookupC], C[id, valueC]. 我的表是这样的:A [id,valueA,lookupB],B [id,valueB,lookupC],C [id,valueC]。 I've written a query with two nested SELECTs, but I'm wondering if it's possible to do INNER JOIN with DISTINCT somehow. 我已经编写了带有两个嵌套SELECT的查询,但我想知道是否可以通过DISTINCT进行INNER JOIN。

SELECT valueC
FROM C
INNER JOIN
(
    SELECT DISTINCT lookupC
    FROM B INNER JOIN
    (
        SELECT DISTINCT lookupB
        FROM A
    ) 
    A2 ON B.id = A2.lookupB
) 
B2 ON C.id = B2.lookupC

EDIT: The tables are fairly large, A is 500k rows, B is 10k rows and C is 100 rows, so there are a lot of uneccesary info if I do a basic inner join and use DISTINCT in the end, like this: 编辑:表是相当大的,A是500k行,B是10k行,C是100行,因此,如果我执行基本的内部联接并最终使用DISTINCT,则会有很多不必要的信息,如下所示:

SELECT DISTINCT valueC
FROM 
C INNER JOIN B on C.id = B.lookupB
INNER JOIN A on B.id = A.lookupB

This is very, very slow (magnitudes times slower than the nested SELECT I do above. 这非常非常慢(幅度比我上面做的嵌套SELECT慢几倍。

I did a test on MS SQL 2005 using the following tables: A 400K rows, B 26K rows and C 450 rows. 我使用下表对MS SQL 2005进行了测试:A 400K行,B 26K行和C 450行。

The estimated query plan indicated that the basic inner join would be 3 times slower than the nested sub-queries, however when actually running the query, the basic inner join was twice as fast as the nested queries, The basic inner join took 297ms on very minimal server hardware. 估计的查询计划表明,基本内部联接的速度将比嵌套子查询慢3倍,但是,在实际运行查询时,基本内部联接的速度是嵌套查询的两倍。最少的服务器硬件。

What database are you using, and what times are you seeing? 您正在使用什么数据库,看到什么时间? I'm thinking if you are seeing poor performance then it is probably an index problem. 我在想,如果您看到性能不佳,那可能是索引问题。

I believe your 1:m relationships should already implicitly create DISTINCT JOINs. 我相信您的1:m关系应该已经隐式创建了DISTINCT JOIN。

But, if you're goal is just C's in each A, it might be easier to just use DISTINCT on the outer-most query. 但是,如果您的目标只是每个A中的C,那么对最外面的查询使用DISTINCT可能会更容易。

SELECT DISTINCT a.valueA, c.valueC
FROM C
    INNER JOIN B ON B.lookupC = C.id
    INNER JOIN A ON A.lookupB = B.id
ORDER BY a.valueA, c.valueC
SELECT DISTINCT C.valueC 
FROM C 
  LEFT JOIN B ON C.id = B.lookupC
  LEFT JOIN A ON B.id = A.lookupB
WHERE C.id IS NOT NULL

I don't see a good reason why you want to limit the result sets of A and B because what you want to have is a list of all C's that are referenced by A. I did a distinct on C.valueC because i guessed you wanted a unique list of C's. 我看不出为什么要限制A和B的结果集的充分理由,因为您要拥有的是A引用的所有C的列表。我在C.valueC上做了与众不同的原因,因为我猜到了您想要一个C的唯一列表。


EDIT : I agree with your argument. 编辑 :我同意你的论点。 Even if your solution looks a bit nested it seems to be the best and fastest way to use your knowledge of the data and reduce the result sets. 即使您的解决方案看起来有点嵌套,它似乎也是使用您对数据的了解并减少结果集的最佳,最快的方法。

There is no distinct join construct you could use so just stay with what you already have :) 您没有可以使用的独特连接构造,因此只需保留已有的内容即可:)

Is this what you mean? 你是这个意思吗?

SELECT DISTINCT C.valueC
FROM 
C
INNER JOIN B ON C.id = B.lookupC
INNER JOIN A ON B.id = A.lookupB

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM