[英]Overlap / intersection between subsets of rows in two tables
I have two tables in Sql Server, one containing IDs for files and the slides contained in those original files, and another for "sections" that can contain slides from one or more of the files, potentially in arbitrary order, duplicated, and/or with some slides eliminated. 我在Sql Server中有两个表,一个表包含文件的ID和这些原始文件中包含的幻灯片,另一个表包含“节”,其中“节”可以包含一个或多个文件中的幻灯片,可能以任意顺序,重复和/或删除了一些幻灯片。
Sample data looks like this: 示例数据如下所示:
FileSlide
FileID SlideID
214 716
214 717
214 718
223 770
223 771
223 772
223 773
223 774
223 775
SectionSlide
SectionID SlideID
527 716
527 718
527 717
527 770
527 773
527 774
527 775
527 774
I originally didn't need a "SectionFile" relation, but now I do need that information to see which files were chosen for a particular section, regardless of slide details. 我最初不需要“ SectionFile”关系,但是现在我需要该信息来查看为特定部分选择了哪些文件,而与幻灯片的详细信息无关。 My problem is examining the slide IDs between the
SectionSlide
and FileSlide
tables to see whether there's an overlap between the slides in any given File-Section pair. 我的问题是检查的幻灯片编号
SectionSlide
和FileSlide
表,以查看是否有任何给定文件的区间对幻灯片之间的重叠。 I would like to find all File-Section pairs that share slides. 我想找到共享幻灯片的所有文件对。
For the sample data above, output would look like this: 对于上面的示例数据,输出如下所示:
SectionFileCandidates
SectionID FileID
527 214
527 223
What is the query to produce this output? 产生此输出的查询是什么?
Is it possible to calculate a metric that indicates what proportion of the original file's slides exists in the section? 是否可以计算一个指标来表明该部分中原始文件幻灯片的比例?
For the sample data above, output would look like this: 对于上面的示例数据,输出如下所示:
SectionFileCandidates
SectionID FileID Overlap
527 214 1.00
527 223 0.67
...that is, 3 out of 3 slides from file 214 are in section 527, and 4 out of 6 slides from file 223 are in section 527. ...也就是说,文件214的3张幻灯片中的3张位于527部分,文件223的6张幻灯片中的4张位于527部分。
I was originally trying to compare groups of rows using the OVER (PARTITION BY ...)
clause, but could not figure it out. 我最初试图使用
OVER (PARTITION BY ...)
子句比较行组,但无法弄清楚。
How can I do these two queries? 我该如何做这两个查询?
Both queries are possible! 这两个查询都是可能的!
First query: 第一个查询:
SELECT s.SectionID,
f.FileID
FROM SectionSlide s
INNER JOIN FileSlide f ON s.SlideID = f.SlideID
GROUP BY s.SectionID, f.FileID
or 要么
SELECT DISTINCT s.SectionID,
f.FileID
FROM SectionSlide s
INNER JOIN FileSlide f ON s.SlideID = f.SlideID
Second query: 第二个查询:
select s.SectionID, f.FileID,
round(((count(distinct f.SlideID)*1.0) / aux.total), 2) as 'Overlap'
from SectionSlide s
inner join FileSlide f on f.SlideID = s.SlideID
inner join (select f.FileID, count(f.SlideID) as 'total'
from FileSlide f
group by f.FileID) aux on aux.FileID = f.FileID
group by f.FileID, s.SectionID, aux.total
I'm sort of confused by your question, but the query below should get you your desired results: 您的问题让我有些困惑,但是下面的查询应该可以为您带来所需的结果:
SELECT DISTINCT fs.FileId, ss.SectionId
FROM FileSlide fs
INNER JOIN SectionSlide ss
ON fs.SlideId= ss.SlideId
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.