简体   繁体   中英

Way to optimize this case when sql query

WITH cte AS 
(   
    SELECT
        channelGrouping,
        visitnumber AS times,
        COUNT(*) AS number_of_visitor   
    FROM
        `bigquery-public-data.google_analytics_sample.*`   
    WHERE
        _TABLE_SUFFIX BETWEEN 'ga_sessions_20160801' AND 'ga_sessions_20171213'
        AND FullvisitorID IN (SELECT fullVisitorID
                              FROM `bigquery-public-data.google_analytics_sample.*`
                              WHERE _TABLE_SUFFIX BETWEEN 'ga_sessions_20160801'
                                                      AND 'ga_sessions_20171213'
                              GROUP BY fullvisitorID
                              HAVING COUNT(fullvisitorid) > 1)    
    GROUP BY
        channelgrouping,
        visitnumber   
    ORDER BY
        channelgrouping,
        times
) 
SELECT   
    *,   
    (number_of_visitor * 100 /
        (CASE
             WHEN channelgrouping = 'Organic Search' 
                 THEN (SELECT number_of_visitor 
                       FROM cte 
                       WHERE times = 1 AND channelgrouping = 'Organic Search')
             WHEN channelgrouping = 'Social' 
                 THEN (SELECT number_of_visitor
                       FROM cte
                       WHERE times = 1 AND channelgrouping = 'Social')
             WHEN channelgrouping = 'Direct' 
                 THEN (SELECT number_of_visitor 
                       FROM cte 
                       WHERE times = 1 AND channelgrouping
='Direct')
             WHEN channelgrouping = 'Referral' 
                 THEN (SELECT number_of_visitor
                       FROM cte
                       WHERE times = 1 AND channelgrouping = 'Referral')
             WHEN channelgrouping = 'Paid Search' 
                 THEN (SELECT number_of_visitor 
                       FROM cte 
                       WHERE times = 1 AND channelgrouping ='Paid Search')
             WHEN channelgrouping = 'Affiliates' 
                 THEN (SELECT number_of_visitor
                       FROM cte
                       WHERE times = 1 AND channelgrouping = 'Affiliates')
             WHEN channelgrouping = 'Display' 
                 THEN (SELECT number_of_visitor 
                       FROM cte 
                       WHERE times = 1 AND channelgrouping = 'Display')
             WHEN channelgrouping = '(Other)' 
                 THEN (SELECT number_of_visitor
                       FROM cte
                       WHERE times = 1 AND channelgrouping = '(Other)')
         END)) AS retention_rate 
FROM   
    cte 
WHERE   
    times > 1 
ORDER BY   
    cte.channelgrouping, cte.times

There are only 8 channelgrouping , so I can list them. Just wonder if there is a better way to reduce the repetition? What if there are 100 channelgroupings ?

There is no need to repeat the same query multiple times in the CASE statement. In fact, there doesn't appear to be any need for a CASE statement at all, since the inner query can reference the outer query and you say these are the only possible values.

You should be able to do something more like:

WITH cte AS (
...
)
SELECT *, 
 number_of_visitor * 100 /
 (
  SELECT number_of_visitor
  FROM cte cte2
  WHERE cte2.times = 1 AND cte1.channelgrouping = cte2.channelgrouping
 )
FROM cte cte1
WHERE   
    times > 1 
ORDER BY   
    cte1.channelgrouping, cte1.times

I should note that this reduces the repetition in the existing query which I think is the intention of your question. I'm clarifying since "optimize" when talking about queries usually has a performance connotation. What I'm suggesting will make the query easier to read and easier to maintain ("What if there are 100 channelgroupings "), but won't necessarily improve performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM