I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?
Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?
SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count
FROM sessions
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;
SELECT session_key, COUNT( view_id ) AS view_count
FROM sessions
LEFT JOIN UNNEST( views ) views
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' )
GROUP BY session_key;
Thank you
Working queries:
a)
SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
b)
SELECT visitId, COUNT( hitNumber ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )
GROUP BY visitId
The first query looks shorter and more concise, but let's also look at the explain tab:
It also looks better at execution time! Probably because the second query has a GROUP BY visitId
, that forces BigQuery to look if there are any other sessions with the same id.
But if you are looking for an even more concise option:
SELECT visitId, ARRAY_LENGTH(hits) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
It is not just about which way better?
- it is also about which way reflects your goal?
because results of those are different! And you can see this in Felipe's answer - first query returns 63 rows and second query returns 62 rows
So, the first query just returns as many rows as your sessions
table has along with count of entries in array filed.
Whereas the second query, in addition to above, groups all rows and aggregates respective counts
Of course, if your table has all visitId unique - this produces the same result
But because this extra grouping - I would expect second query to be more expensive
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.