简体   繁体   中英

MSSQL query: how to find the incorrect row by each partition?

I need to find incorrect rows according to the logic.

The logic is:

  1. If the child has the row (I will call first row)

     | merit | fruit | vegetable | | --------- | ----- | --------- | | behaviour | apple | cucumber |

    then in the row with merit = poem and fruit = apple must be only vegetable = cucumber (cucumber and no other words) (It is the second row)

     | merit | fruit | vegetable | | ----- | ----- | --------- | | poem | apple | cucumber |
  2. AND time interval of the second row must be 4 hours earlier or later from the time of the first row, as a correct example:

     | child_id | date | merit | fruit | vegetable | | --------- | --------------- | --------- | ----- | --------- | | 2 | 1/26/2022 16:00 | poem | apple | cucumber | | 2 | 1/26/2022 18:00 | behaviour | apple | cucumber |

    As we can see, it is in 4 hours interval

I have the table:

| child_id  | date            | merit       | fruit   | vegetable |
| --------- | --------------- | ----------- | ------- | --------- |
| 1         | 1/27/2022 14:00 | behaviour   | apple   | cucumber  |
| 1         | 1/27/2022 15:00 | poem        | apple   | carrot    |
| 1         | 1/27/2022 17:00 | sleep       | apple   | ginger    |
| 1         | 1/27/2022 20:00 | competition | berry   | tomatoe   |
| 2         | 1/26/2022 13:00 | sleep       | apricot | tomatoe   |
| 2         | 1/30/2022 13:00 | poem        | apple   | cucumber  |
| 2         | 1/29/2022 13:00 | poem        | apple   | cucumber  |
| 2         | 1/26/2022 16:00 | poem        | apple   | cucumber  |
| 2         | 1/26/2022 18:00 | behaviour   | apple   | cucumber  |
| 2         | 1/26/2022 19:00 | present     | apple   | broccoli  |
| 3         | 1/25/2022 11:00 | present     | orange  | cucumber  |
| 3         | 1/25/2022 13:00 | poem        | apple   | ginger    |
| 3         | 1/25/2022 15:00 | behaviour   | apple   | cucumber  |
| 4         | 1/26/2022 14:00 | behaviour   | apple   | cucumber  |
| 4         | 1/27/2022 21:00 | poem        | apple   | carrot    |
| 4         | 1/27/2022 15:00 | poem        | apple   | carrot    |
| 4         | 1/27/2022 20:00 | sleep       | apple   | ginger    |
| 4         | 1/27/2022 21:00 | competition | berry   | tomatoe   |

And the result I expect:

| child_id  | date             | merit | fruit | vegetable |
| --------- | --------------- | ----- | ----- | --------- |
| 1         | 1/27/2022 15:00 | poem  | apple | carrot    |
| 3         | 1/25/2022 13:00 | poem  | apple | ginger    |

I do not know how to find this rows by child. I wrote this SQL and stuck:

select * from example_1 where merit in ('behaviour', 'poem') 

Do I need partitions here?

In this approach we use a collated subquery. The top query B defines non-join limits of the data for the desired results. So vegetable <> cucumber and merit =poem

The Exists ensures the limits of the first row are defined and the correlation for the non matches exists. so we ensure fruits match, merit is 'behavior', the child_id's match, and the difference in date is within 4 hours either way.

DEMO-DB Fiddle UK

SELECT B.* 
FROM table B
WHERE vegetable <> 'cucumber'
  and merit = 'poem'
  and exists (SELECT 1 
              FROM Table A
              WHERE A.Fruit = B.Fruit
                AND A.Child_id = B.Child_ID
                AND A.merit = 'behaviour' 
                AND abs(Datediff(hour,A.Date,B.Date)) <=4)

Giving us:

+----------+-------------------------+-------+-------+-----------+
| child_id |          date           | merit | fruit | vegetable |
+----------+-------------------------+-------+-------+-----------+
|        1 | 2022-01-27 15:00:00.000 | poem  | apple | carrot    |
|        3 | 2022-01-25 13:00:00.000 | poem  | apple | ginger    |
+----------+-------------------------+-------+-------+-----------+

One potential solution is joining the table to itself using a LEFT OUTER JOIN and then only accepting records where the joined version of the table returns null:

SELECT e1.* 
FROM example_1 e1
   LEFT OUTER JOIN example_1 e2
       ON e1.fruit = e2.fruit
       AND e1.vegetable <> e2.vegetable
       AND e2.date BETWEEN DATEADD(HOUR, -4, e1.date) AND e1.date
       AND e2.merit = 'behavior'
WHERE e1.merit = 'poem'
   AND e2.child_id IS NULL 

The trick is mostly in the join criteria where we want to ensure we match vegetable between the 'behavior' and 'poem' while also checking for the last 4 hours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM