简体   繁体   中英

How do I write a query that takes a count of records of one date plus the previous 4 days for every date in a table?

Trying to write a query from the following table that calculates a count of records for every day date(timestamp) and the prior 4 days. So basically a rolling count of records for every previous 5 days. Every time i do the calculation returns slightly off.

Symbol timestamp high number date(timestamp) date(timestamp - interval 1 day) date(timestamp - interval 2 day) date(timestamp - interval 3 day) date(timestamp - interval 4 day)
SPY 2021-04-26 04:00:00+00:00 416.97 1 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 06:20:00+00:00 416.91 2 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:00:00+00:00 416.84 3 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:05:00+00:00 416.8 4 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:10:00+00:00 416.81 5 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:15:00+00:00 416.78 6 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:20:00+00:00 416.75 7 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:25:00+00:00 416.54 8 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:30:00+00:00 416.51 9 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:35:00+00:00 416.34 10 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22
SPY 2021-04-26 08:40:00+00:00 416.33 11 2021-04-26 2021-04-25 2021-04-24 2021-04-23 2021-04-22

The following query returns the counts but they are slightly off. For example of 2021-4-30 the count should match the max(number), the enumerated rows but it does not. Please help

select date(t.timestamp),date(t.timestamp - interval 4 day), 
(count(t.timestamp)+count(t.timestamp - interval 4 day)+count(t.timestamp - interval 3 day)+count(t.timestamp - interval 2 day)+count(t.timestamp - interval 1 day)) as cntrecords, 
max(number), min(number)
from
(SELECT Symbol,
timestamp, 
high, 
ROW_NUMBER() OVER (ORDER BY timestamp) AS number,
date(timestamp),
date(timestamp - interval 4 day)
#BETWEEN DATE_SUB(date(timestamp), INTERVAL 4 DAY) AND date(timestamp)
FROM test.rawdata 
#WHERE date(timestamp) BETWEEN DATE_SUB('2021-04-30', INTERVAL 4 DAY) AND '2021-04-30'
#group by date(timestamp)
order by timestamp) as t
group by date(t.timestamp);

We will start by getting the counts per symbol per day with a simple GROUPed query. If you want the results for the week commencing 26th April 2021 you need to remember to include the four days prior to the start of the desired date range -

SELECT `Symbol`, DATE(`timestamp`) `date`, COUNT(*) `count`
FROM `test`.`rawdata`
WHERE `timestamp` BETWEEN '2021-04-22 00:00:00' AND '2021-05-02 23:59:59'
GROUP BY `Symbol`, `date`;

Now we can use SUM as a window function with a 5 day frame to get your counts -

SELECT *, SUM(`count`) OVER (
    PARTITION BY `Symbol`
    ORDER BY `date` ASC
    RANGE INTERVAL 4 DAY PRECEDING
) `5_day_count`
FROM (
    SELECT `Symbol`, DATE(`timestamp`) `date`, COUNT(*) `count`
    FROM `test`.`rawdata`
    WHERE `timestamp` BETWEEN '2021-04-22 00:00:00' AND '2021-05-02 23:59:59'
    GROUP BY `Symbol`, `date`
) tbl;

The above query will return the leading four rows which we can remove by adding another level of nesting (we cannot use a HAVING clause as the filter would be applied before the SELECT list is evaluated) -

SELECT *
FROM (
    SELECT *, SUM(`count`) OVER (
        PARTITION BY `Symbol`
        ORDER BY `date` ASC
        RANGE INTERVAL 4 DAY PRECEDING
    ) `5_day_count`
    FROM (
        SELECT `Symbol`, DATE(`timestamp`) `date`, COUNT(*) `count`
        FROM `test`.`rawdata`
        WHERE `timestamp` BETWEEN '2021-04-22 00:00:00' AND '2021-05-02 23:59:59'
        GROUP BY `Symbol`, `date`
    ) tbl
) t2
WHERE `date` BETWEEN '2021-04-26' AND '2021-05-02';

As an alternative approach, you could use a recursive cte to build a list of ranges to join and group by -

WITH RECURSIVE `cte` (`date`, `start`, `end`) AS (
    SELECT
        CAST('2021-04-26' AS DATE),
        CAST('2021-04-22 00:00:00' AS DATETIME),
        CAST('2021-04-26 23:59:59' AS DATETIME)
    UNION ALL
    SELECT
        `date` + INTERVAL 1 DAY,
        `start` + INTERVAL 1 DAY,
        `end` + INTERVAL 1 DAY
    FROM `cte`
    WHERE `date` < '2021-05-02'
)
SELECT `rawdata`.`symbol`, `cte`.`date`, COUNT(*) `count`
FROM `cte`
JOIN `test`.`rawdata`
    ON `rawdata`.`timestamp` BETWEEN `cte`.`start` AND `cte`.`end`
GROUP BY `rawdata`.`symbol`, `cte`.`date`;

This was not stated in the question, but I'm guessing that columns like number and date(timestamp - interval 3 day) were added to try and help solve this problem, or are results of an intermediate query. Either way, they are unnecessary.

Let's solve this problem for a table which only has symbol , timestamp , and high , using a window function with frame specification :

-- Get the counts for each day (factored out as a CTE)
with Counts AS (
    SELECT 
      symbol,
      DATE(timestamp) AS day,
      COUNT(*) AS day_count
    FROM 
     prices
    GROUP BY symbol, day

)
SELECT
    symbol,
    day,
    -- Here we use the window function with a frame spec to sum for the last 4 days per row
    SUM(day_count) OVER (
        PARTITION BY symbol 
        ORDER BY day
        RANGE BETWEEN INTERVAL '4' DAY PRECEDING AND CURRENT ROW
    )
FROM
    Counts

Note that I'm using RANGE instead of ROW because if there are days for which there are no data, that would throw off the counts.

Also note that this will not work in MariaDb which does not yet support RANGE expressions with TIME like fields and intervals.

You can play around with this example in this DB Fiddle: https://www.db-fiddle.com/f/ctN927ouAMHrWK1QJQD6Z2/0

(Note I only used 1 day in the range there so I didn't have to generate as much test data.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM