简体   繁体   English

具有依赖于多行的多个排列的查询

[英]Query with multiple permutations that depend on multiple rows

(SQL2014 if that makes a difference) (SQL2014 如果有影响的话)

Say I have [tableA]说我有 [tableA]

id  ResultID  SampleID  ERRORCODE col4 colN

1   9001      1100      0         ...  ...
2   9002      1100      100       ...  ...
3   9003      1100      200       ...  ...
4   9004      1100      300       ...  ...
5   9005      1101      0         ...  ... 
6   9006      1101      0         ...  ...
7   9007      1101      0         ...  ...
8   9008      1101      0         ...  ...
9   9009      1102      0         ...  ...
10  9010      1102      100       ...  ...
11  9011      1102      200       ...  ...
12  9012      1102      0         ...  ...

and I want to produce a result that only shows the sample sets (identified by common SampleID vals) that contain all of errorcodes 0, 100, 200 & 300. ie the above would reduce to:我想生成一个只显示包含所有错误代码 0、100、200 和 300 的样本集(由常见SampleID标识)的结果。即上述内容将减少为:

id  ResultID  SampleID  ERRORCODE col4 colN

1   9001      1100      0         ...  ...
2   9002      1100      100       ...  ...
3   9003      1100      200       ...  ...
4   9004      1100      300       ...  ...

So I'm needing a query that looks at multiple rows at one time and downselects groups that have (i) certain key values in a column, here thats [ERRORCODE] and (ii) consistent values in another column, here thats [SampleID] .所以我需要一个查询,它一次查看多行并向下选择具有(i)列中某些键值的组,这里是[ERRORCODE]和(ii)另一列中的一致值,这里是[SampleID] . I've looked at:我看过:

Query with multiple IN clause on multiple rows 在多行上使用多个 IN 子句进行查询

But didn't have any joy.却一点喜悦都没有。 The subquery that worked for oliboon is only working on one row for me.适用于 oliboon 的子查询对我来说只适用于一行。 Olga's code didn't work at all and Aushin's produced unexpected results (and removing half the syntax didn't change them)! Olga 的代码根本不起作用,而 Aushin 的代码产生了意想不到的结果(删除一半的语法并没有改变它们)!

N00b to SQL, so I'm a bit lost! N00b 到 SQL,所以我有点迷路了!

The "table" keyword used in a few of those solutions listed doesn't seem to work for me - and it may be that those answers were intent on creating a subtable which was then further parsed in subqueries for their solution.列出的一些解决方案中使用的“table”关键字似乎对我不起作用 - 可能是这些答案旨在创建一个子表,然后在子查询中进一步解析其解决方案。 All I get is an "incorrect syntax near the keyword table" error if I try anything like:如果我尝试以下操作,我得到的只是“关键字表附近的语法不正确”错误:

select distinct SampleID from table [my].[db].[path].[tableA]

I thought a query of the format我想到了格式的查询

SELECT *
From [tableA]
where
    [SampleID] in (Select [SampleID] from [tableA] where [ERRORCODE] = 0) and
    [SampleID] in (Select [SampleID] from [tableA] where [ERRORCODE] = 100) 

Would have worked, but it only returns a result if the two [ERRORCODE] checks are for the same code, ie 100. Which of course is useless.本来可以,但是如果两个 [ERRORCODE] 检查是针对相同的代码,即 100,它只会返回一个结果。这当然是无用的。 Its checking line by line rather than doing the first part of the logic gate, then the 2nd.它逐行检查,而不是执行逻辑门的第一部分,然后是第二部分。

If I was able to make the first part of the where [ERRORCODE]=0 complete, then downselect from those SampleID's where [ERRORCODE]=100 and repeat, then that'd work.如果我能够完成 [ERRORCODE]=0 的第一部分,然后从 [ERRORCODE]=100 的那些 SampleID 中向下选择并重复,那么就可以了。 Not sure how to do that though.不知道该怎么做。

edit: Ach FFS.编辑:Ach FFS。 Turns out every single errorcode I was looking must be mutually exclusive with each other - no matter what combination, no two could occur together.事实证明,我正在寻找的每一个错误代码都必须相互排斥——无论是什么组合,都不能同时出现两个错误代码。 I'd assumed that in the size of the DB I had, there would have been a combination somewhere.我假设在我拥有的数据库的大小中,某处会有一个组合。

I checking my query with errorcodes I can see from an unqualified SELECT * that sit beside each other and did prove it works.我用错误代码检查了我的查询,我可以从一个不合格的 SELECT * 中看到它们并排放置并证明它有效。

Question is invalid I suppose.我想问题是无效的。

You can use top(1) with ties.. order by to skip all but first SampleID + ERRORCODE combinations.您可以使用top(1) with ties.. order by跳过除第一个SampleID + ERRORCODE组合之外的所有组合。 Then count rows matching with codes, it must be exactly the number of codes.然后计算与代码匹配的行数,它必须正好是代码的数量。

with codes as (
    select 0 c union all
    select 100 union all
    select 200 union all
    select 300
),
errlog as ( 
   -- take only first occurence of SampleID + ERRORCODE
   select top(1) with ties id,  ResultID,  SampleID,  ERRORCODE, col4, colN
   from [my].[db].[path].[tableA]
   order by row_number() over(partition by SampleID, ERRORCODE order by ResultID)
)
select id,  ResultID,  SampleID,  ERRORCODE, col4, colN
from (
   select t.*, count(*) over(partition by t.SampleID) cnt
   from errlog t
   join codes on codes.c = t.ERRORCODE
) t
where cnt = (select count(*) from codes);

I'm a little lost.我有点失落。 If you want samples with all four error codes, then this should do what you want:如果您想要包含所有四个错误代码的样本,那么这应该可以满足您的需求:

select a.*
from [tableA] a
where a.SampleID in (Select a2.SampleID from tableA a2 where a2.ERRORCODE = 0) and
      a.SampleID in (Select a2.SampleID from tableA a2 where a2.ERRORCODE = 100) and
      a.SampleID in (Select a2.SampleID from tableA a2 where a2.ERRORCODE = 200) and
      a.SampleID in (Select a2.SampleID from tableA a2 where a2.ERRORCODE = 300) ;

This should work, although the question claims that it does not.这应该有效,尽管问题声称它没有。 It is not necessarily going to have the best performance.它不一定会有最好的性能。

I usually recommend aggregation to et the sample ids:我通常建议聚合以设置样本 ID:

select sampleid
from tablea
where errorcode in (0, 100, 200, 300)
group by sampleid
having count(distinct errorcode) = 4;

If you don't want duplicates as well as covering all the error codes, then use:如果您不想重复并覆盖所有错误代码,请使用:

having count(distinct errorcode) = 4 and count(*) = 4

This also may not have the best performance in all cases.这也可能不是在所有情况下都具有最佳性能。 But the performance is predictable -- it changes little based on the number of codes you are looking for.但是性能是可以预测的——它几乎不会根据您要查找的代码数量而变化。 And the having clause can make this quite versatile. having子句可以使它非常通用。

Then, if you want all the original data, you can use join , in , or exists :然后,如果您想要所有原始数据,您可以使用joininexists

select a.*
from tablea a join
     (select sampleid
      from tablea
      where errorcode in (0, 100, 200, 300)
      group by sampleid
      having count(distinct errorcode) = 4
     ) a2
     on a2.sampleid = a.sampleid;
  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM