简体   繁体   English

两个表中的行子集之间的重叠/交集

[英]Overlap / intersection between subsets of rows in two tables

I have two tables in Sql Server, one containing IDs for files and the slides contained in those original files, and another for "sections" that can contain slides from one or more of the files, potentially in arbitrary order, duplicated, and/or with some slides eliminated. 我在Sql Server中有两个表,一个表包含文件的ID和这些原始文件中包含的幻灯片,另一个表包含“节”,其中“节”可以包含一个或多个文件中的幻灯片,可能以任意顺序,重复和/或删除了一些幻灯片。

Sample data looks like this: 示例数据如下所示:

FileSlide

FileID       SlideID
214          716
214          717
214          718
223          770
223          771
223          772
223          773
223          774
223          775

SectionSlide

SectionID    SlideID
527          716
527          718
527          717
527          770
527          773
527          774
527          775
527          774

I originally didn't need a "SectionFile" relation, but now I do need that information to see which files were chosen for a particular section, regardless of slide details. 我最初不需要“ SectionFile”关系,但是现在我需要该信息来查看为特定部分选择了哪些文件,而与幻灯片的详细信息无关。 My problem is examining the slide IDs between the SectionSlide and FileSlide tables to see whether there's an overlap between the slides in any given File-Section pair. 我的问题是检查的幻灯片编号SectionSlideFileSlide表,以查看是否有任何给定文件的区间对幻灯片之间的重叠。 I would like to find all File-Section pairs that share slides. 我想找到共享幻灯片的所有文件对。

For the sample data above, output would look like this: 对于上面的示例数据,输出如下所示:

SectionFileCandidates

SectionID    FileID
527          214
527          223

What is the query to produce this output? 产生此输出的查询是什么?

Is it possible to calculate a metric that indicates what proportion of the original file's slides exists in the section? 是否可以计算一个指标来表明该部分中原始文件幻灯片的比例?

For the sample data above, output would look like this: 对于上面的示例数据,输出如下所示:

SectionFileCandidates

SectionID    FileID    Overlap
527          214       1.00
527          223       0.67

...that is, 3 out of 3 slides from file 214 are in section 527, and 4 out of 6 slides from file 223 are in section 527. ...也就是说,文件214的3张幻灯片中的3张位于527部分,文件223的6张幻灯片中的4张位于527部分。

I was originally trying to compare groups of rows using the OVER (PARTITION BY ...) clause, but could not figure it out. 我最初试图使用OVER (PARTITION BY ...)子句比较行组,但无法弄清楚。

How can I do these two queries? 我该如何做这两个查询?

Both queries are possible! 这两个查询都是可能的!


First query: 第一个查询:

SELECT s.SectionID,
       f.FileID
FROM SectionSlide s
INNER JOIN FileSlide f ON s.SlideID = f.SlideID
GROUP BY s.SectionID, f.FileID

or 要么

SELECT DISTINCT s.SectionID,
                f.FileID
FROM SectionSlide s
INNER JOIN FileSlide f ON s.SlideID = f.SlideID

Second query: 第二个查询:

select s.SectionID, f.FileID,
       round(((count(distinct f.SlideID)*1.0) / aux.total), 2) as 'Overlap'
from SectionSlide s
inner join FileSlide f on f.SlideID = s.SlideID
inner join (select f.FileID, count(f.SlideID) as 'total'
            from FileSlide f
            group by f.FileID) aux on aux.FileID = f.FileID
group by f.FileID, s.SectionID, aux.total

I'm sort of confused by your question, but the query below should get you your desired results: 您的问题让我有些困惑,但是下面的查询应该可以为您带来所需的结果:

SELECT DISTINCT fs.FileId, ss.SectionId
FROM FileSlide fs
INNER JOIN SectionSlide ss
ON fs.SlideId= ss.SlideId

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM