简体   繁体   中英

Aggregation in Postgres for finding last value

I have a single table in postgres which holds aggregated data, the tables has the following fields

search_term --> a particular search term
date --> a date when the search has been performed
search_count --> how many times search has been performed with this search term
min_result_count --> what was the minimum number of result returned by the search term
max_results_count --> what was the maximum number of result returned by the search term
last_result_count --> number of search result returned when last search was performed
zero_result_count --> how mnay times there was no result for this search term

where and combination is unique, meaning search term won't be repeated for the date rather the value would be updated.组合是唯一的,这意味着搜索词不会在日期重复,而是会更新值。

I am trying to write a sql query for the duration of 7 days to get the following record
search_term
min_result_count
max_result_count
zero_result_count
last_result_count

I could find all the values using the aggregation MIN, MAX, SUM but I am unable to find the value for the last_result_count since this would require me to pick up the last value only.

Here is one same table with expected result

search_term    search_count    min_rc    max_rc    zero_count    last_rc    date
---------------------------------------------------------------------------------------
term1          10              10        20        0              4        01-01-2020
term1          10              11        21        0              5        02-01-2020
term1          10              12        22        0              6        03-01-2020
term1          10              13        23        0              7        04-01-2020
term1          10              14        24        0              8        05-01-2020

term2          10              24        25        0              9        01-01-2020
term2          10              23        26        0              10       02-01-2020
term2          10              22        27        0              11       03-01-2020
term2          10              21        28        0              12       04-01-2020
term2          10              0         29        3              0        04-01-2020

if I run the query 05-01-2020, I should get

search_term    search_count    min_rc    max_rc    zero_count    last_rc
-------------------------------------------------------------------------
term1          50              10        24        0              8      
term2          50              0         29        3              0     

if I run the query 04-01-2020, I should get

search_term    search_count    min_rc    max_rc    zero_count    last_rc
-------------------------------------------------------------------------
term1          40              10        23        0              7      
term2          40              21        28        0              12     

if I run the query 03-01-2020, I should get

search_term    search_count    min_rc    max_rc    zero_count    last_rc
-------------------------------------------------------------------------
term1          30              10        23        0              6      
term2          30              22        27        0              11     
  • rc stands for result_count

and so on, any help to derive last_result_count would be really helpful

You can use ROW_NUMBER window function for this. ROW_NUMBER orders your data with your inted then generates a number.

ROW_NUMBER()OVER(PARTITION BY date,search_term ORDER BY LAST_RC) AS ROW_NUMBERED_COLUMN

You can then group your data and use MAX(ROW_NUMBERED_COLUMN)

You could use window_functions like below.

Select search_term ,
SUM(search_count) OVER (partition by search_term order BY date)  as search_count,
MIN(min_rc) OVER (partition by search_term order BY date)  as min_rc,
MAX(max_rc) OVER (partition by search_term order BY date)  as max_rc,
zero_count,
last_rc , 
DATE 
from t
ORDER BY search_term,date 

Result set:

search_term    search_count    min_rc    max_rc    zero_count    last_rc   date
term1          10              10        20         0              4       01-01-2020
term1          20              10        21         0              5       02-01-2020
term1          30              10        22         0              6       03-01-2020
term1          40              10        23         0              7       04-01-2020
term1          50              10        24         0              8       05-01-2020
term2          10              24        25         0              9       01-01-2020
term2          20              23        26         0              10      02-01-2020
term2          30              22        27         0              11      03-01-2020
term2          50              0         29         0              12      04-01-2020
term2          50              0         29         3              0       04-01-2020

updated version*

SELECT search_term,search_count, min_rc, max_rc, zero_count, last_rc
FROM
(SELECT search_term ,
        SUM(search_count) OVER (partition by search_term order BY date) as search_count,
        MIN(min_rc) OVER (partition by search_term order BY date) as min_rc,
        MAX(max_rc) OVER (partition by search_term order BY date) as max_rc,
        zero_count,
        last_rc,
        RANK() OVER (partition by search_term order BY date desc) as rnk,
        date
 FROM t
 WHERE date <= '05-01-2020'
 ) A 
 WHERE A.rnk = 1

Another method which is simpler and I realized what you wanted after your comment.

SELECT search_term ,
SUM(search_count) as search_count,
MIN(min_rc) as min_rc,
MAX(max_rc) as max_rc,
SUM(zero_count) as zero_count,
(SELECT last_rc FROM t as a WHERE a.search_term = t.search_term AND a.date = 
 t.date ORDER BY date desc LIMIT 1) AS last_rc,
MAX(date) as date
FROM t
WHERE date <= '05-01-2020'
GROUP BY search_term
ORDER BY search_term

This is even more simple using window function last_value

Select search_term ,
SUM(search_count) as search_count,
MIN(min_rc) as min_rc,
MAX(max_rc) as max_rc,
SUM(zero_count) as zero_count,
LAST_VALUE(last_rc) OVER (Partition by search_term ORDER BY date desc) AS last_rc,
MAX(date) as date
FROM t
WHERE date <= '03-01-2020'
GROUP BY search_term
ORDER BY search_term

Result set using any of the updated versions.

search_term search_count    min_rc  max_rc  zero_count  last_rc
term1       50              10      24      0           8
term2       50              0       29      3           0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM