简体   繁体   English

MSSQL 查询:如何按每个分区查找不正确的行?

[英]MSSQL query: how to find the incorrect row by each partition?

I need to find incorrect rows according to the logic.我需要根据逻辑找到不正确的行。

The logic is:逻辑是:

  1. If the child has the row (I will call first row)如果孩子有排(我会打电话给第一排)

     | merit | fruit | vegetable | | --------- | ----- | --------- | | behaviour | apple | cucumber |

    then in the row with merit = poem and fruit = apple must be only vegetable = cucumber (cucumber and no other words) (It is the second row)然后在有优点的行=诗水果=苹果必须只有蔬菜= cucumber (黄瓜没有别的词) (这是第二行)

     | merit | fruit | vegetable | | ----- | ----- | --------- | | poem | apple | cucumber |
  2. AND time interval of the second row must be 4 hours earlier or later from the time of the first row, as a correct example:第二行的 AND 时间间隔必须比第一行的时间早或晚 4 小时,作为正确示例:

     | child_id | date | merit | fruit | vegetable | | --------- | --------------- | --------- | ----- | --------- | | 2 | 1/26/2022 16:00 | poem | apple | cucumber | | 2 | 1/26/2022 18:00 | behaviour | apple | cucumber |

    As we can see, it is in 4 hours interval如我们所见,间隔为 4 小时

I have the table:我有桌子:

| child_id  | date            | merit       | fruit   | vegetable |
| --------- | --------------- | ----------- | ------- | --------- |
| 1         | 1/27/2022 14:00 | behaviour   | apple   | cucumber  |
| 1         | 1/27/2022 15:00 | poem        | apple   | carrot    |
| 1         | 1/27/2022 17:00 | sleep       | apple   | ginger    |
| 1         | 1/27/2022 20:00 | competition | berry   | tomatoe   |
| 2         | 1/26/2022 13:00 | sleep       | apricot | tomatoe   |
| 2         | 1/30/2022 13:00 | poem        | apple   | cucumber  |
| 2         | 1/29/2022 13:00 | poem        | apple   | cucumber  |
| 2         | 1/26/2022 16:00 | poem        | apple   | cucumber  |
| 2         | 1/26/2022 18:00 | behaviour   | apple   | cucumber  |
| 2         | 1/26/2022 19:00 | present     | apple   | broccoli  |
| 3         | 1/25/2022 11:00 | present     | orange  | cucumber  |
| 3         | 1/25/2022 13:00 | poem        | apple   | ginger    |
| 3         | 1/25/2022 15:00 | behaviour   | apple   | cucumber  |
| 4         | 1/26/2022 14:00 | behaviour   | apple   | cucumber  |
| 4         | 1/27/2022 21:00 | poem        | apple   | carrot    |
| 4         | 1/27/2022 15:00 | poem        | apple   | carrot    |
| 4         | 1/27/2022 20:00 | sleep       | apple   | ginger    |
| 4         | 1/27/2022 21:00 | competition | berry   | tomatoe   |

And the result I expect:我期望的结果:

| child_id  | date             | merit | fruit | vegetable |
| --------- | --------------- | ----- | ----- | --------- |
| 1         | 1/27/2022 15:00 | poem  | apple | carrot    |
| 3         | 1/25/2022 13:00 | poem  | apple | ginger    |

I do not know how to find this rows by child.我不知道如何按孩子找到这些行。 I wrote this SQL and stuck:我写了这个 SQL 并卡住了:

select * from example_1 where merit in ('behaviour', 'poem') 

Do I need partitions here?我这里需要分区吗?

In this approach we use a collated subquery.在这种方法中,我们使用整理子查询。 The top query B defines non-join limits of the data for the desired results.顶部查询 B 为所需结果定义数据的非连接限制。 So vegetable <> cucumber and merit =poem所以菜<>cucumber和功=诗

The Exists ensures the limits of the first row are defined and the correlation for the non matches exists. Exists 确保定义了第一行的限制并且存在非匹配项的相关性。 so we ensure fruits match, merit is 'behavior', the child_id's match, and the difference in date is within 4 hours either way.所以我们确保水果匹配,优点是“行为”,child_id 的匹配,并且日期的差异在 4 小时内。

DEMO-DB Fiddle UK DEMO-DB Fiddle UK

SELECT B.* 
FROM table B
WHERE vegetable <> 'cucumber'
  and merit = 'poem'
  and exists (SELECT 1 
              FROM Table A
              WHERE A.Fruit = B.Fruit
                AND A.Child_id = B.Child_ID
                AND A.merit = 'behaviour' 
                AND abs(Datediff(hour,A.Date,B.Date)) <=4)

Giving us:给我们:

+----------+-------------------------+-------+-------+-----------+
| child_id |          date           | merit | fruit | vegetable |
+----------+-------------------------+-------+-------+-----------+
|        1 | 2022-01-27 15:00:00.000 | poem  | apple | carrot    |
|        3 | 2022-01-25 13:00:00.000 | poem  | apple | ginger    |
+----------+-------------------------+-------+-------+-----------+

One potential solution is joining the table to itself using a LEFT OUTER JOIN and then only accepting records where the joined version of the table returns null:一种可能的解决方案是使用 LEFT OUTER JOIN 将表连接到自身,然后只接受表的连接版本返回 null 的记录:

SELECT e1.* 
FROM example_1 e1
   LEFT OUTER JOIN example_1 e2
       ON e1.fruit = e2.fruit
       AND e1.vegetable <> e2.vegetable
       AND e2.date BETWEEN DATEADD(HOUR, -4, e1.date) AND e1.date
       AND e2.merit = 'behavior'
WHERE e1.merit = 'poem'
   AND e2.child_id IS NULL 

The trick is mostly in the join criteria where we want to ensure we match vegetable between the 'behavior' and 'poem' while also checking for the last 4 hours.诀窍主要在连接标准中,我们希望确保我们在“行为”和“诗歌”之间匹配vegetable ,同时还要检查最后 4 小时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM