简体   繁体   中英

Bigquery SQL - Is it better to unnest in SELECT or JOIN?

I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?

Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?

SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count 
FROM sessions 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;

SELECT session_key, COUNT( view_id ) AS view_count 
FROM sessions 
  LEFT JOIN UNNEST( views ) views 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) 
GROUP BY session_key; 

Thank you

Working queries:

a)

SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

b)

SELECT visitId, COUNT( hitNumber ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )  
GROUP BY visitId

The first query looks shorter and more concise, but let's also look at the explain tab:

a) 在此输入图像描述

b) 在此输入图像描述

It also looks better at execution time! Probably because the second query has a GROUP BY visitId , that forces BigQuery to look if there are any other sessions with the same id.

But if you are looking for an even more concise option:

SELECT visitId, ARRAY_LENGTH(hits) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

It is not just about which way better? - it is also about which way reflects your goal? because results of those are different! And you can see this in Felipe's answer - first query returns 63 rows and second query returns 62 rows

So, the first query just returns as many rows as your sessions table has along with count of entries in array filed.
Whereas the second query, in addition to above, groups all rows and aggregates respective counts

Of course, if your table has all visitId unique - this produces the same result
But because this extra grouping - I would expect second query to be more expensive

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM