简体   繁体   中英

Crafting a Subquery-able UNION ALL based on the results of a query

Data

I have a couple of tables like so:

CREATE TABLE cycles (
  `cycle` varchar(6) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `cycle_type` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `start` date DEFAULT NULL,
  `end` date DEFAULT NULL
);

CREATE TABLE rsvn (
  `str` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `start_date` date DEFAULT NULL,
  `end_date` date DEFAULT NULL
);

INSERT INTO `cycles` (`cycle`, `cycle_type`, `start`, `end`) values
('202013', 'a', '2021-01-04', '2021-01-31'),
('202013', 'b', '2021-01-04', '2021-01-31'),
('202101', 'a', '2021-01-04', '2021-01-31'),
('202101', 'b', '2021-01-04', '2021-01-31'),
('202102', 'a', '2021-02-01', '2021-02-28'),
('202102', 'b', '2021-02-01', '2021-02-28'),
('202103', 'a', '2021-03-01', '2021-03-28'),
('202103', 'b', '2021-03-01', '2021-03-28');

INSERT INTO `rsvn` (str, start_date, end_date) values
('STR01367', '2020-12-07', '2020-06-21'),
('STR00759', '2020-12-07', '2021-04-25'),
('STR01367', '2021-01-04', '2021-09-12'),
('STR01367', '2021-06-21', '2022-02-27');

Desired Results

For any given cycle, I want to count the number of occurrences of str across cycles. So between cycle 2108 - 2108 (one cycle), I see:

str count
STR01367 1
STR00759 1

And from between 2108 - 2109 (two cycles) I see:

str count
STR01367 2
STR00759 1

What I've tried

I'm trying to figure out how to dynamically obtain those results. I don't see any options outside a UNION ALL query (one query for each cycles), so I tried writing a PROCEDURE. However, that didn't work because I want to do post-processing on the query results, and I don't believe you can use the results of a PROCEDURE in a CTE or subquery.

My PROCEDURE (works, can't include results in a subquery like SELECT * FROM call count_cycles (?)):

CREATE PROCEDURE `count_cycles`(start_cycle CHAR(6), end_cycle CHAR(6))
BEGIN
    SET @cycles := (
        SELECT CONCAT('WITH installed_cycles_count AS (',
            GROUP_CONCAT(
                CONCAT('
        SELECT rsvn.str, 1 AS installed_cycles
        FROM rsvn
        WHERE "', `cy`.`start`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
           OR "', `cy`.`end`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
        GROUP BY rsvn.str
    '
                )
                 SEPARATOR ' UNION ALL '
            ),
    ')

    SELECT
             store.chain AS "Chain"
            ,store.division AS "Division"
            ,dividers_store AS "Store"
            ,SUM(installed_cycles) AS "Installed Cycles"
    FROM installed_cycles_count r
    LEFT JOIN store ON store.name = r.dividers_store
    GROUP BY dividers_store
    ORDER BY chain, division, dividers_store, installed_cycles'
        )
        FROM cycles `cy`
        WHERE `cy`.`cycle_type` = 'Ad Cycle'
            AND `cy`.`cycle` >= CONCAT('20', RIGHT(start_cycle, 4))
            AND `cy`.`cycle` <= CONCAT('20', RIGHT(end_cycle, 4))
        GROUP BY `cy`.`cycle_type`
    );

    EXECUTE IMMEDIATE @cycles;
END

Alternatively, I attempted to use a recursive query to obtain my results by incrementing my cycle. This gave me the cycles I wanted:

WITH RECURSIVE xyz AS (
    SELECT cy.`cycle`, cy.`start`, cy.`end`
    FROM cycles cy
    WHERE cycle_type = 'Ad Cycle'
    AND `cycle` = '202101'

    UNION ALL
    
    SELECT cy.`cycle`, cy.`start`, cy.`end`
    FROM xyz
    JOIN cycles cy
        ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
        AND cy.`cycle_type` = 'Ad Cycle'
    WHERE cy.`cycle` <= '202110'
)
SELECT * FROM xyz;

But I can't get it working when I add in the reservations table: infinite loop?

WITH RECURSIVE xyz AS (
    SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
    FROM cycles cy
    LEFT JOIN rsvn dr
        ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
            OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
    WHERE cy.`cycle_type` = 'Ad Cycle'
        AND cy.`cycle` = '202101'

    UNION ALL

    SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
    FROM xyz
    JOIN cycles cy
        ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
        AND cy.`cycle_type` = 'Ad Cycle'
    LEFT JOIN rsvn dr
        ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
            OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
    WHERE cy.`cycle` <= '202102'
)
SELECT * FROM xyz

What options do I have to get the results I need, in such a way that I can use them in a CTE or subquery?

The results I am looking for are easily obtained via a two-stage grouping. Something like this:

WITH sbc AS (
    SELECT cy.`cycle`, dr.str, 1 AS 'count'
    FROM cycles cy
    LEFT JOIN rsvn dr
        ON cy.`start` BETWEEN dr.start_date AND dr.end_date
            OR cy.`end` BETWEEN dr.start_date AND dr.end_date
    WHERE cy.`cycle_type` = 'Ad Cycle'
        AND cy.`cycle` BETWEEN '202201' AND '202205'
    GROUP BY cy.`cycle`, dr.str
    ORDER BY dr.str, cy.`cycle`
)
SELECT `cycle`, str, SUM(`count`) as `count`
FROM sbc
GROUP BY str

The CTE produces one result per rsvn per cycle. Afterwards all that is needed is to group by store and count the number of occurrences.

Besides being simpler, I suspect that this query is faster than the union concept I was stuck on when I asked the question, since among other things the server does not need to perform a union on multiple grouping queries. However, I do not understand how MariaDB optimizes such queries, and while I am curious I don't have the time to run benchmarks to find out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM