I'm writing query in BigQuery trying to export the distinct pages a session visited, following the sequence of when (PageVisit_time) the pages are visited (ascending), below is what I wrote and the outputs:
SELECT DISTINCT
fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id,
page.pagePath,
MIN(DATETIME_ADD(DATETIME(TIMESTAMP_SECONDS(visitStartTime),"America/New_York"), INTERVAL hits.time MILLISECOND)) AS PageVisit_time
from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE hits.type = "PAGE"
and date = '20220403'
group by 1,2
order by 1, 3 desc
Outputs of the above query:
session_id | pagePath | PageVisit_time |
---|---|---|
123 | /point | 2022-04-03T11:26:13.719000 |
123 | /point | 2022-04-03T11:27:15.653820 |
123 | /point-ad | 2022-04-03T11:34:10.000000 |
123 | /point-ad-next | 2022-04-03T12:38:15.82340 |
123 | /point | 2022-04-03T12:50:18.123820 |
I want to keep only distinct session_id and pagePath in the final output, and I also want to make suer pagePath is listing follow the same sequence (ascending by PageVisit_time), any suggestions on how to make it work?
Ideal output:
session_id | pagePath |
---|---|
123 | /point |
123 | /point-ad |
123 | /point-ad-next |
If I'm not wrong, you want to get the first row for each pagePath in the session, and also the other attributes from these events.
If I'm correct, you can use array_agg function to get the first row you want. I'm simplifying the query here, so you can modify it based on your needs.
SELECT
fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id,
page.pagePath,
array_agg(struct(
visitStartTime,
attribute1,
attribute2,
attribute
) order by visitStartTime limit 1) as attr
from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE hits.type = "PAGE"
and date = '20220403'
group by 1,2
order by session_id, attr.visitStartTime
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.