简体   繁体   English

SQL:如何从聚合结果中排除某些行?

[英]SQL: how can I exclude certain lines from an aggregated result?

In the query I built, the result shows something like below:在我构建的查询中,结果显示如下:

SELECT name
      ,ARRAY_AGG(fruits ORDER BY time ASC) AS all_fruits
FROM table_fruits
name姓名 all_fruits all_fruits
Person A人甲 Apple, Banana, Apple, Apple, Apple, Apple苹果,香蕉,苹果,苹果,苹果,苹果
Person B乙人 Apple, Apple, Apple, Banana, Apple, Banana苹果,苹果,苹果,香蕉,苹果,香蕉
Person C C人 Banana, Banana, Apple, Banana, Apple, Apple香蕉,香蕉,苹果,香蕉,苹果,苹果

I want to add one more column which shows the count of apples.我想再添加一列显示苹果的数量。 However, I do not want to count apples that are followed by bananas.但是,我不想数苹果之后是香蕉。 Therefore, the additional column should look like below.因此,附加列应如下所示。

name姓名 all_fruits all_fruits count_of_apple count_of_apple
Person A人甲 Apple, Banana, Apple, Apple, Apple, Apple苹果,香蕉,苹果,苹果,苹果,苹果 4 4
Person B乙人 Apple, Apple, Apple, Banana, Apple, Banana苹果,苹果,苹果,香蕉,苹果,香蕉 2 2
Person C C人 Banana, Banana, Apple, Banana, Apple, Apple香蕉,香蕉,苹果,香蕉,苹果,苹果 2 2

How would I do this in SQL?我将如何在 SQL 中执行此操作? The source includes time for when the fruit was eaten.来源包括食用水果的时间。

You can check:您可以检查:

  • for each row you have in your parent table, what " fruits " value follows that row with the LEAD window function对于您在父表中的每一行,使用LEAD窗口函数该行后面的“水果”值是什么
  • if the row in check is the last one, it won't have a next value (it will be NULL), so the COALESCE function will replace this NULL value with the current " fruits " value如果检查的行是最后一行,它将没有下一个值(它将为 NULL),因此COALESCE函数将用当前的“ fruits ”值替换这个 NULL 值
  • hence you can assign 1 to your new column when the current " fruits " value is "Apple" and your next value is not "Banana" , inside a CASE statement因此,当当前的“ fruits ”值为"Apple"并且您的下一个值不是"Banana"时,您可以在CASE语句中为新列分配 1
SELECT *,
       CASE WHEN fruits = 'Apple'
             AND COALESCE(LEAD(fruits) OVER(
                              PARTITION BY name 
                              ORDER     BY time), 
                          fruits)                  <> 'Banana'
            THEN 1 
       END AS apples_not_after_bananas
FROM table_fruits

After this step, you can use your own code and add在这一步之后,您可以使用自己的代码并添加

  • the GROUP BY clause you missed, to aggregate over the " name " field您错过的GROUP BY子句,用于聚合“名称”字段
  • the SUM aggregation function over the previously generated 1 s when apples were not followed by bananas.当苹果后面没有香蕉时, SUM聚合函数在先前生成的1秒内。
WITH cte AS (
    SELECT *,
           CASE WHEN fruits = 'Apple'
                 AND COALESCE(LEAD(fruits) OVER(
                                  PARTITION BY name 
                                  ORDER     BY time), 
                              fruits)                  <> 'Banana'
                THEN 1 
           END AS apples_not_after_bananas
    FROM table_fruits
)
SELECT name,
       ARRAY_AGG(fruits ORDER BY time ASC) AS all_fruits,
       SUM(apples_not_after_bananas)       AS count_of_apple
FROM cte
GROUP BY name

Check the demo here .此处查看演示。


Edit : the banana came more than 1 day later编辑:香蕉在 1 天后才来

If you want to add this specific condition, or in general any conditions, you need to work inside the CASE statement, which currently has two conditions, one on the current fruit and one on the next fruit.如果要添加此特定条件或一般任何条件,则需要在 CASE 语句中工作,该语句当前有两个条件,一个针对当前水果,一个针对下一个水果。

Checking whether the banana came more than 1 day later just means to add something like this:检查香蕉是否在 1 天后到达只是意味着添加如下内容:

           CASE WHEN fruits = 'Apple'
                 AND COALESCE(LEAD(fruits) OVER(
                                  PARTITION BY name 
                                  ORDER     BY time), 
                              fruits)                  <> 'Banana'
               --AND <if difference between the current next time value is greater than 1 day>
                THEN 1 
           END AS apples_not_after_bananas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM