简体   繁体   English

同一张表上的外连接 sql

[英]outer join on same table sql

I am creating a time machine with c#.我正在用 c# 创建一个时间机器。 A time machine is a way of creating a backup of my files in the way where I can access a specific file like it was at a specific time.时间机器是一种创建文件备份的方式,我可以访问特定文件,就像在特定时间一样。 Anyways the way I am doing so is by looking for all the files inside a directory and I store those files information in a table named table1.无论如何,我这样做的方式是查找目录中的所有文件,并将这些文件信息存储在名为 table1 的表中。 So if the first time I scan my computer lets assume I only have 3 files therefore my table will look something like:因此,如果我第一次扫描我的计算机,假设我只有 3 个文件,因此我的表看起来像:

ID   FullName   DateModified   DateInsertedToDatabase
 1     C:\A       456588731             0
 2     C:\B       955588762             0
 3     C:\C       854587783             0

lets say that next time I perform a back up I have the same 3 files but I have created a new file and modified file C.假设下次我执行备份时,我有相同的 3 个文件,但我创建了一个新文件并修改了文件 C。 As a result my table should now look like:结果,我的表现在应该如下所示:

    ID   FullName   DateModified   DateInsertedToDatabase
     1     C:\A       456588731             0
     2     C:\B       955588762             0
     3     C:\C       854587783             0
     4     C:\A       456588731             1
     5     C:\B       955588762             1
     6     C:\C       111122212             1
     7     C:\X       123212321             1

now I will like to copy file C and File X because those are the files that have been changed or created.现在我想复制文件 C 和文件 X,因为这些是已更改或创建的文件。 How could I build a query where I could obtain file X and file C?我如何构建一个查询来获取文件 X 和文件 C? In other words I want to get all the files that have a DateInsertedToDatabase = 1 and that don't match files where DateInsertedToDatabase is less than 1.换句话说,我想获取 DateInsertedToDatabase = 1 且与 DateInsertedToDatabase 小于 1 的文件不匹配的所有文件。

if I am not being clear here is the continuation of my example: lets say that I continue with my example and I delete files: B and C, I modify file X, I create a new file Z. My table should look like:如果我不清楚这里是我的示例的延续:假设我继续我的示例并删除文件:B 和 C,我修改文件 X,我创建一个新文件 Z。我的表应该如下所示:

    ID   FullName   DateModified   DateInsertedToDatabase
     1     C:\A       456588731             0
     2     C:\B       955588762             0
     3     C:\C       854587783             0
     4     C:\A       456588731             1
     5     C:\B       955588762             1
     6     C:\C       111122212             1
     7     C:\X       123212321             1
     8     C:\A       456588731             2
     9     C:\X       898989898             2
     10    C:\Z       789564545             2

here I will like to get files X and Z because file X was modified and File Z was created.在这里,我想获取文件 X 和 Z,因为文件 X 已修改并且文件 Z 已创建。 I will not want to get file A because that file already exist with the same DateModified.我不想获取文件 A,因为该文件已经存在相同的 DateModified。 How could I build that query?我怎么能建立那个查询?

Hmm, I think I understand.嗯,我想我明白了。 You want to get all files that match on the MAX(DateInsertedToDatabase) but don't have a previous row that also matches their DateModified?您想获取与 MAX(DateInsertedToDatabase) 匹配但没有前一行也匹配其 DateModified 的所有文件?

You want to do what I call a "reverse inner join."你想做我所说的“反向内部连接”。 Basically a left join that filters out anything that would have successfully matched in an inner join.基本上是一个左连接,它过滤掉任何可以在内连接中成功匹配的东西。 There are other ways it could be done as well (eg using subqueries).还有其他方法也可以完成(例如使用子查询)。

This is in T-SQL:这是在 T-SQL 中:

CREATE TABLE #mytemp
(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [FullName] [nvarchar](50) NOT NULL,
    DateModified [nvarchar](9) NOT NULL, 
    DateInsertedToDatabase [int] NOT NULL
)

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '0')
INSERT INTO #mytemp VALUES ('C:\B', '955588762', '0')
INSERT INTO #mytemp VALUES ('C:\C', '854587783', '0')

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '1')
INSERT INTO #mytemp VALUES ('C:\B', '955588762', '1')
INSERT INTO #mytemp VALUES ('C:\C', '111122212', '1')
INSERT INTO #mytemp VALUES ('C:\X', '123212321', '1')

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '2')
INSERT INTO #mytemp VALUES ('C:\X', '898989898', '2')
INSERT INTO #mytemp VALUES ('C:\Z', '789564545', '2') 

SELECT 
    temp1.*
FROM 
    #mytemp temp1
    LEFT JOIN #mytemp temp2 ON 
            temp1.ID != temp2.ID --don't match on the same two rows
            AND temp1.FullName = temp2.FullName --match based on full name
            AND temp1.DateModified = temp2.DateModified --and date modified
WHERE
    temp1.DateInsertedToDatabase = (SELECT MAX(DateInsertedToDatabase) FROM #mytemp)
    AND temp2.ID IS NULL --filter out rows that would have matched on an INNER JOIN 

 DROP TABLE #mytemp

I don't know SqlLite, but I hope this will work anyway.我不知道 SqlLite,但我希望这无论如何都能工作。 It doesn't use anything fancy.它不使用任何花哨的东西。

Select t1.* 
From Table1 t1
Left join Table1 t2
On t1.FullName = t2.FullName
And t1.DateInsertedToDatabase = t2.DateInsertedToDatabase + 1
Where t1.DateInsertedToDatabase = (select max(DateInsertedToDatabase) from Table1)
And (t1.DateModified <> t2.DateModified or t2.FullName is null)

Joining on DateInsertedToDatabase + 1 will join with the previous record.在 DateInsertedToDatabase + 1 上加入将与上一条记录一起加入。 Then you filter for the highest DateInsertedToDatabase and include either records that don't have a match (they are new) or where the modified dates don't match.然后过滤最高的 DateInsertedToDatabase 并包括不匹配的记录(它们是新的)或修改日期不匹配的记录。

Phil Sandler 's answer works.菲尔桑德勒回答有效。 This does, too:这也是:

    SELECT FullName
      FROM table1
INNER JOIN (SELECT FullName, DateModified
              FROM table1
             WHERE DateInsertedToDatabase = (SELECT MAX(DateInsertedToDatabase) FROM table1)) d
     USING (FullName, DateModified)
  GROUP BY FullName
    HAVING COUNT(1) = 1

I modified it because I am working with a lot of files therefore the solution works great but not for queries dealing with a lot of records.我对其进行了修改,因为我正在处理大量文件,因此该解决方案效果很好,但不适用于处理大量记录的查询。 Here is what I worked out.这是我的工作。

lets assume I have this records so far:假设到目前为止我有这些记录:

在此处输入图像描述

Select * from table1 WHERE DateInserted = 4
 and Path not in(
        select Path from table1 t1 
        where 
            DateInserted = 4 AND
            Path IN (Select Path from table1 where DateInserted<4) AND
            DateModified IN (Select DateModified from table1 where DateInserted<4)
    )

and that returns:并返回:

在此处输入图像描述

this query works out much faster.此查询的运行速度要快得多。 I will obviously have to change the 4 for a variable in my code but this is just to illustrate the changes that I have done.我显然必须在我的代码中更改 4 作为变量,但这只是为了说明我所做的更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM