简体   繁体   中英

Bigquery standard SQL: Filter out the duplicates while keeping the sequence of one column

I'm writing query in BigQuery trying to export the distinct pages a session visited, following the sequence of when (PageVisit_time) the pages are visited (ascending), below is what I wrote and the outputs:

SELECT DISTINCT 
fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id, 
page.pagePath,
MIN(DATETIME_ADD(DATETIME(TIMESTAMP_SECONDS(visitStartTime),"America/New_York"), INTERVAL hits.time MILLISECOND))  AS PageVisit_time

from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE hits.type = "PAGE"
and date = '20220403' 
group by 1,2
order by 1, 3 desc

Outputs of the above query:

session_id pagePath PageVisit_time
123 /point 2022-04-03T11:26:13.719000
123 /point 2022-04-03T11:27:15.653820
123 /point-ad 2022-04-03T11:34:10.000000
123 /point-ad-next 2022-04-03T12:38:15.82340
123 /point 2022-04-03T12:50:18.123820

I want to keep only distinct session_id and pagePath in the final output, and I also want to make suer pagePath is listing follow the same sequence (ascending by PageVisit_time), any suggestions on how to make it work?

Ideal output:

session_id pagePath
123 /point
123 /point-ad
123 /point-ad-next

If I'm not wrong, you want to get the first row for each pagePath in the session, and also the other attributes from these events.
If I'm correct, you can use array_agg function to get the first row you want. I'm simplifying the query here, so you can modify it based on your needs.

SELECT 
    fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id, 
    page.pagePath,
    array_agg(struct(
        visitStartTime,
        attribute1,
        attribute2,
        attribute
    ) order by visitStartTime limit 1) as attr
from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE hits.type = "PAGE"
    and date = '20220403' 
group by 1,2
order by session_id, attr.visitStartTime

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM