简体   繁体   English

计数与前 24 小时的天差

[英]Day difference of counts from prior 24 hours

I have a table to consolidate stats for different types from my other tables:我有一个表来合并其他表中不同类型的统计信息:

Table name: my_stats表名: my_stats

lob         category            parameter                   total_count     timestamp                       day_difference
DSS         Industry            Advertising & Marketing     310057          2020-04-21 07:35:14.237987
DSS         Function            Administration              357351          2020-04-21 11:06:27.009658
DSS         Country             czechia                     321             2020-04-21 11:12:55.731648
DSS         Records per domain  apple.com                   65              2020-04-21 11:13:17.855059
DSS         Records per domain  Records per domain          5               2020-04-21 11:13:17.85510

DSS         Industry            Advertising & Marketing     310059          2020-04-21 10:36:14.237987
DSS         Function            Administration              357353          2020-04-21 14:08:26.009658
DSS         Country             czechia                     324             2020-04-21 14:11:55.731648
DSS         Records per domain  apple.com                   60              2020-04-21 14:08:17.855059
DSS         Records per domain  Records per domain          5               2020-04-21 14:14:17.85510

DSS         Industry            Advertising & Marketing     310058          2020-04-22 08:35:14.237987
DSS         Function            Administration              357312          2020-04-22 11:05:27.009658
DSS         Country             czechia                     201             2020-04-22 11:13:55.731648
DSS         Records per domain  apple.com                   55              2020-04-22 11:14:17.855059
DSS         Records per domain  Records per domain          2               2020-04-22 11:15:17.85510

my_stats gets updated every 3 hours. my_stats每 3 小时更新一次。 So, new entries are added every 3 hours.因此,每 3 小时添加一次新条目。 I have to find the day_difference value.我必须找到day_difference值。

The day_difference value is (count - count of subsequent row closest 24 hours prior). day_difference值为(count - 24 小时前最近的后续行的计数)。

The output of the table should be:表格的output应该是:

lob         category            parameter                   total_count     timestamp                       day_difference
DSS         Industry            Advertising & Marketing     310057          2020-04-21 07:35:14.237987      NULL
DSS         Function            Administration              357351          2020-04-21 11:06:27.009658      NULL
DSS         Country             czechia                     321             2020-04-21 11:12:55.731648      NULL
DSS         Records per domain  apple.com                   65              2020-04-21 11:13:17.855059      NULL
DSS         Records per domain  Records per domain          5               2020-04-21 11:13:17.85510       NULL

DSS         Industry            Advertising & Marketing     310059          2020-04-21 10:36:14.237987      NULL
DSS         Function            Administration              357353          2020-04-21 14:08:26.009658      NULL
DSS         Country             czechia                     324             2020-04-21 14:11:55.731648      NULL
DSS         Records per domain  apple.com                   60              2020-04-21 14:08:17.855059      NULL
DSS         Records per domain  Records per domain          5               2020-04-21 14:14:17.85510       NULL

DSS         Industry            Advertising & Marketing     310058          2020-04-22 08:35:14.237987      1
DSS         Function            Administration              357312          2020-04-22 11:05:27.009658      NULL
DSS         Country             czechia                     201             2020-04-22 11:13:55.731648      -120
DSS         Records per domain  apple.com                   55              2020-04-22 11:14:17.855059      -10
DSS         Records per domain  Records per domain          2               2020-04-22 11:15:17.85510       -3

If for the row, subsequent row prior to 24 hours is not available, then keep the day_difference = NULL.如果对于该行,24 小时之前的后续行不可用,则保留 day_difference = NULL。

Another corner case to be considered is the difference should be with the CLOSEST prior 24 hours difference.另一个要考虑的极端情况是差异应该是最近的 24 小时差异。

Is there a way I can get this result in SQL?有没有办法在 SQL 中得到这个结果?

This would have been a good spot to use a window function such as lag() with a range specification.这是使用 window function 的好地方,例如具有范围规范的lag() Alas, Redshift only supports rows in the frame clause to window functions. las,Redshift 仅支持 frame 子句中的rows到 window 函数。

Here is an alternative that uses a correlated subquery:这是使用相关子查询的替代方法:

select
    s.*,
    total_count - (
        select total_count
        from my_stats s1
        where 
            s1.lob = s.lob
            and s1.category = s.category
            and s1.parameter = s.parameter
            and s1.timestamp < s.timestamp - interval '1 day'
        order by s1.timestamp desc
        limit 1
    ) day_diff
from my_stats s

For performance, you do want an index on (lob, category, parameter, timestamp, total_count) .为了提高性能,您确实需要(lob, category, parameter, timestamp, total_count)上的索引。

You can use RANK to identify the first record in the 24-hour period, then just a simple subtraction.您可以使用RANK 来识别24 小时内的第一条记录,然后进行简单的减法。 I'm assuming you're using Amazon RedShift SQL here.我假设您在这里使用的是 Amazon RedShift SQL。 If you are not, the NVL syntax is replace with ISNULL in MS SQL.如果不是,则NVL语法在 MS SQL 中被替换为ISNULL

https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

SELECT
  r.lob
 ,r.category
 ,r.parameter
 ,r.total_count
 ,r.timestamp
 ,i.day_difference
FROM dbo.my_stats r
LEFT JOIN (

  SELECT
    x.lob
   ,x.category
   ,x.parameter
   ,x.total_count
   ,x.timestamp
   ,RANK() OVER (PARTITION BY p.lob,p.category,p.parameter ORDER BY p.timestamp) AS rank_order
   ,x.total_count - p.total_count AS day_difference
  FROM dbo.my_stats x
  INNER JOIN dbo.my_stats p  --> lookback period
    ON p.lob       = x.lob
   AND p.category  = x.category
   AND p.parameter = x.parameter
   AND p.timestamp < x.timestamp
   AND p.timestamp > DATEADD(HOUR,-24,x.timestamp)

) i
 ON i.lob       = r.lob
AND i.category  = r.category
AND i.parameter = r.parameter
AND i.timestamp = r.timestamp
WHERE NVL(i.rank_order,1)=1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将小时:分钟转换为秒,其中小时 > 24 - Convert hours:minutes to seconds where hours > 24 24 小时后自动更改 Firestore 中的变量 - Swift - Automatically change a variable in Firestore after 24 hours - Swift 24 小时后自动更改 Firestore 中的变量 - Android 中的 Java - Automatically change a variable in Firestore after 24 hours - Java in Android Pyspark df.write 耗时极长(超过 24 小时) - Pyspark df.write taking extremely long (over 24 hours) CloudWatch 无法在 24 小时内可靠地监控单个数据点 - CloudWatch doesn't reliably monitor single datapoint within 24 hours 在 Bigquery 中计算每天总记录数和每天具有相同时间时间戳和 id 的总记录数的查询 - Query that counts total records per day and total records with same time timestamp and id per day in Bigquery Flutter:如何限制用户每24小时执行的操作次数? - Flutter: How can I limit the number of actions performed by a user every 24 hours? 在 Twilio 和 Sendgrid api 中,计划消息以 5 小时 30 分钟的时间差发送 - Scheduled messages are sent at difference of 5 hours 30 minutes in Twilio and Sendgrid api 如何在 Sql 中从特定日期到日期生成第 1 天、第 2 天、第 3 天等日期 - How generate days like day 1, day 2,day-3 from particular date to date in Sql SQL 日期格式 - 将时差显示为 X 天 X 小时 X 分钟 - SQL Date Format - Display the time difference as X days X hours X mins
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM