简体   繁体   English

在Netezza中按滚动日期间隔进行分组

[英]Grouping by rolling date interval in Netezza

I have a table in Netezza that looks like this 我在Netezza有一张桌子,看起来像这样

Date         Stock    Return
2015-01-01   A        xxx
2015-01-02   A        xxx
2015-01-03   A        0
2015-01-04   A        0
2015-01-05   A        xxx
2015-01-06   A        xxx
2015-01-07   A        xxx
2015-01-08   A        xxx
2015-01-09   A        xxx
2015-01-10   A        0
2015-01-11   A        0
2015-01-12   A        xxx
2015-01-13   A        xxx
2015-01-14   A        xxx
2015-01-15   A        xxx
2015-01-16   A        xxx
2015-01-17   A        0
2015-01-18   A        0
2015-01-19   A        xxx
2015-01-20   A        xxx

The data represents stock returns for various stocks and dates. 数据代表各种股票和日期的股票收益。 what I need to do is group the data by a given interval, and day of that interval. 我需要做的是按给定的时间间隔和该时间间隔的天对数据进行分组。 Another difficulty is that weekends the (0s) will have to be discounted (ignoring public holidays). 另一个困难是周末(0)必须打折(忽略公共假期)。 And the start date of the first interval should be an arbitrary date. 并且第一个间隔的开始日期应该是任意日期。

For example my out put should look sth like this 例如我的输出应该看起来像这样

Interval    Q01    Q02    Q03    Q04    Q05
1           xxx    xxx    xxx    xxx    xxx
2           xxx    xxx    xxx    xxx    xxx
3           xxx    xxx    xxx    xxx    xxx 
4           xxx    xxx    xxx    xxx    xxx

This output would represent an interval of the length 5 working days, with averaged returns as results, in terms of the raw data from above, start date 1st Jan, 1st Interval includes 1/2/5/6/7 (3 and 4 are weekends and are ignored) Q01 would be the 1st, Q02 the 2nd, Q03 the 5th etc. The second interval goes from 8/9/12/13/14. 此输出表示间隔为5个工作日,平均回报为结果,根据上方的原始数据,开始日期为1月1日,第一个间隔为1/2/5/6/7(3和4为周末,将被忽略),Q01将是第一个,Q02是第二个,Q03是第五个,依此类推。第二个间隔从8/9/12/13/14开始。

What I tried unsuccessfully is using 我尝试失败的东西正在使用

CEIL(CAST(EXTRACT(DOY FROM DATE) AS FLOAT) / CAST (10 AS FLOAT)) AS interval
EXTRACT(DAY FROM DATE) % 10 AS DAYinInterval

I also tried playing around with rolling counters and for variable starting dates setting my DOY to zero with s.th like this 我还尝试使用滚动计数器,并针对可变的开始日期,使用s.th将我的DOY设置为零,就像这样

CEIL(CAST(EXTRACT(DOY FROM DATE) - EXTRACT(DOY FROM 'start-date' AS FLOAT) / CAST (10 AS FLOAT)) AS Interval

The one thing that came closest to what I would expect is this SUM(Number) OVER(PARTITION BY STOCK ORDER BY DATE ASC rows 10 preceding) AS Counter 最接近我期望的一件事是此SUM(Number)OVER(按库存商品排序或按日期排序ASC前10行)AS计数器

Unfortunately it goes from 1 to 10 followed by 11s where it should start from 1 to 10 again. 不幸的是,它从1到10,然后是11s,应该从1到10再开始。

I would love to see how this can get implemented in an elegant way. 我很想看看如何以一种优雅的方式来实现它。 thanks 谢谢

I'm not entirely sure I understand the question, but I think I might, so I'm going to take a swing at this with some windowed aggregates and subqueries. 我并不完全确定我理解这个问题,但是我我可以,所以我将通过一些窗口聚合和子查询来解决这个问题。

Here's the sample data, plugging in some random non-zero data for weekdays. 这是示例数据,在工作日插入一些随机的非零数据。

    DATE    | STOCK | RETURN
------------+-------+--------
 2015-01-01 | A     |     16
 2015-01-02 | A     |     80
 2015-01-03 | A     |      0
 2015-01-04 | A     |      0
 2015-01-05 | A     |     60
 2015-01-06 | A     |     25
 2015-01-07 | A     |     12
 2015-01-08 | A     |      1
 2015-01-09 | A     |     81
 2015-01-10 | A     |      0
 2015-01-11 | A     |      0
 2015-01-12 | A     |     35
 2015-01-13 | A     |     20
 2015-01-14 | A     |     69
 2015-01-15 | A     |     72
 2015-01-16 | A     |     89
 2015-01-17 | A     |      0
 2015-01-18 | A     |      0
 2015-01-19 | A     |    100
 2015-01-20 | A     |     67
(20 rows)

Here's my swing at it, with embedded comments. 这是我的想法,带有嵌入式注释。

select avg(return),
   date_period,
   day_period
from (
        -- use row_number to generate a sequential value for each DOW,
        -- with a WHERE to filter out the weekends
      select date,
         stock,
         return,
         date_period ,
         row_number() over (partition by date_period order by date asc) day_period
      from (
            -- bin out the entries by date_period using the first_value of the entire set as the starting point
            -- modulo 7
            select date,
               stock,
               return,
               date + (first_value(date) over (order by date asc) - date) % 7 date_period
            from stocks
            where date >= '2015-01-01'
            -- setting the starting period date here
         )
         foo
      where extract (dow from date) not in (1,7)
   )
   foo
group by date_period, day_period
order by date_period asc;

The results: 结果:

    AVG     | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
  16.000000 | 2015-01-01  |          1
  80.000000 | 2015-01-01  |          2
  60.000000 | 2015-01-01  |          3
  25.000000 | 2015-01-01  |          4
  12.000000 | 2015-01-01  |          5
   1.000000 | 2015-01-08  |          1
  81.000000 | 2015-01-08  |          2
  35.000000 | 2015-01-08  |          3
  20.000000 | 2015-01-08  |          4
  69.000000 | 2015-01-08  |          5
  72.000000 | 2015-01-15  |          1
  89.000000 | 2015-01-15  |          2
 100.000000 | 2015-01-15  |          3
  67.000000 | 2015-01-15  |          4
(14 rows)

Changing the starting date to '2015-01-03' to see if it adjusts properly: 将开始日期更改为“ 2015-01-03”以查看其是否正确调整:

...
from stocks
where date >= '2015-01-03'
...

And the results: 结果:

   AVG     | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
  60.000000 | 2015-01-03  |          1
  25.000000 | 2015-01-03  |          2
  12.000000 | 2015-01-03  |          3
   1.000000 | 2015-01-03  |          4
  81.000000 | 2015-01-03  |          5
  35.000000 | 2015-01-10  |          1
  20.000000 | 2015-01-10  |          2
  69.000000 | 2015-01-10  |          3
  72.000000 | 2015-01-10  |          4
  89.000000 | 2015-01-10  |          5
 100.000000 | 2015-01-17  |          1
  67.000000 | 2015-01-17  |          2
(12 rows)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM