简体   繁体   English

匹配识别将行数据收集到单列中

[英]match recognize collect row data into single column

I'm following the tutorial for match_recognize found here :我正在按照此处找到的match_recognize教程进行操作:

create or replace temporary table stock_price_history (company text, price_date date, price int);
insert into stock_price_history values
    ('ABCD', '2020-10-01', 50),
    ('ABCD', '2020-10-02', 50),
    ('ABCD', '2020-10-03', 51),
    ('ABCD', '2020-10-04', 51),
    ('ABCD', '2020-10-05', 51),
    ('ABCD', '2020-10-06', 52),
    ('ABCD', '2020-10-07', 71),
    ('ABCD', '2020-10-08', 80),
    ('ABCD', '2020-10-09', 90),
    ('ABCD', '2020-10-10', 63),
    ('XYZ' , '2020-10-01', 24),
    ('XYZ' , '2020-10-02', 24),
    ('XYZ' , '2020-10-03', 37),
    ('XYZ' , '2020-10-04', 63),
    ('XYZ' , '2020-10-05', 65),
    ('XYZ' , '2020-10-06', 66),
    ('XYZ' , '2020-10-07', 50),
    ('XYZ' , '2020-10-08', 54),
    ('XYZ' , '2020-10-09', 30),
    ('XYZ' , '2020-10-10', 32);
    
select * from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    one row per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
order by company, match_number;

The code above is my version of the tutorial code.上面的代码是我的教程代码版本。 Everything works fine except the price as all_price part in the measures clause.一切正常,除了 price 作为measures条款中的price as all_price部分。 What I want to do is collect all prices in the pattern and return it as an array into a single column.我想要做的是收集模式中的所有价格并将其作为数组返回到单个列中。 I know I can do all rows per match to get all rows but that's not what I want.我知道我可以在all rows per match中完成所有行以获得所有行,但这不是我想要的。

How would I go about doing that?我将如何 go 这样做?

You have to specify all rows per match or lose that information out of the match_recognize function. You can use array_agg within group to get the prices in a single array.您必须指定all rows per match否则会从 match_recognize function 中丢失该信息。您可以在组内使用 array_agg 来获取单个数组中的价格。 Since this aggregates row counts down you may want to do the same for the dates of each of these prices - something like this:由于此聚合行倒计时,您可能希望对每个价格的日期执行相同的操作 - 如下所示:

select   COMPANY
        ,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
        ,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    all rows per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
group by company
order by company
;
COMPANY公司 ALL_PRICE全部价格 ALL_PRICE_DATE ALL_PRICE_DATE
ABCD A B C D [ 50, 51, 51, 51, 52, 52, 71, 80 ] [ 50, 51, 51, 51, 52, 52, 71, 80 ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-06", "2020-10-07", "2020-10-08" ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-06", " 2020-10-07", "2020-10-08" ]
XYZ XYZ [ 24, 37, 63, 63, 65, 66 ] [ 24, 37, 63, 63, 65, 66 ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-04", "2020-10-05", "2020-10-06" ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-04", "2020-10-05", "2020-10-06" ]

If you want to keep all rows, you can use the window function version of array_agg:如果要保留所有行,可以使用 window function 版本的 array_agg:

select   * exclude ALL_PRICE
        ,array_agg(PRICE) within group (order by PRICE_DATE) 
            over (partition by COMPANY) as ALL_PRICE
from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    all rows per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
order by company
;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM