WITH cte AS
(
SELECT
channelGrouping,
visitnumber AS times,
COUNT(*) AS number_of_visitor
FROM
`bigquery-public-data.google_analytics_sample.*`
WHERE
_TABLE_SUFFIX BETWEEN 'ga_sessions_20160801' AND 'ga_sessions_20171213'
AND FullvisitorID IN (SELECT fullVisitorID
FROM `bigquery-public-data.google_analytics_sample.*`
WHERE _TABLE_SUFFIX BETWEEN 'ga_sessions_20160801'
AND 'ga_sessions_20171213'
GROUP BY fullvisitorID
HAVING COUNT(fullvisitorid) > 1)
GROUP BY
channelgrouping,
visitnumber
ORDER BY
channelgrouping,
times
)
SELECT
*,
(number_of_visitor * 100 /
(CASE
WHEN channelgrouping = 'Organic Search'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = 'Organic Search')
WHEN channelgrouping = 'Social'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = 'Social')
WHEN channelgrouping = 'Direct'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping
='Direct')
WHEN channelgrouping = 'Referral'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = 'Referral')
WHEN channelgrouping = 'Paid Search'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping ='Paid Search')
WHEN channelgrouping = 'Affiliates'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = 'Affiliates')
WHEN channelgrouping = 'Display'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = 'Display')
WHEN channelgrouping = '(Other)'
THEN (SELECT number_of_visitor
FROM cte
WHERE times = 1 AND channelgrouping = '(Other)')
END)) AS retention_rate
FROM
cte
WHERE
times > 1
ORDER BY
cte.channelgrouping, cte.times
There are only 8 channelgrouping
, so I can list them. Just wonder if there is a better way to reduce the repetition? What if there are 100 channelgroupings
?
There is no need to repeat the same query multiple times in the CASE
statement. In fact, there doesn't appear to be any need for a CASE
statement at all, since the inner query can reference the outer query and you say these are the only possible values.
You should be able to do something more like:
WITH cte AS (
...
)
SELECT *,
number_of_visitor * 100 /
(
SELECT number_of_visitor
FROM cte cte2
WHERE cte2.times = 1 AND cte1.channelgrouping = cte2.channelgrouping
)
FROM cte cte1
WHERE
times > 1
ORDER BY
cte1.channelgrouping, cte1.times
I should note that this reduces the repetition in the existing query which I think is the intention of your question. I'm clarifying since "optimize" when talking about queries usually has a performance connotation. What I'm suggesting will make the query easier to read and easier to maintain ("What if there are 100 channelgroupings
"), but won't necessarily improve performance.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.