[英]BigQuery finding sessions that have visited both pageA (contains keyword "main") and pageB (contains keyword "side")
在 BQ 上,我試圖查找訪問過 pageA(URL 包含關鍵字“main”)和 pageB(URL 包含關鍵字“side”)以及 session 訪問過的頁面的會話。 這是我的邏輯,我首先想找出訪問過pageAs的會話(URL包含關鍵字“main”),然后我想做一個join,所以要找出那些訪問過pageAs的會話,網站上還有哪些頁面他們去過嗎? 以下是我的查詢:
select a. sessionID, b.pagepath
from
(SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.page.pagepath as pagepath
FROM
`xx.xxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
totals.visits = 1
and hits.page.pagepath like '%main%'
AND _TABLE_SUFFIX BETWEEN '20220214'
AND '20220225') a
left join
(SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.page.pagepath as pagepath
FROM
`xx.xxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
totals.visits = 1
AND _TABLE_SUFFIX BETWEEN '20220214'
AND '20220225') b
on a.sessionID=b.sessionID
order by 1 desc
我在這里附上一個示例 output:
會話ID | 頁面路徑 |
---|---|
123 | /主要尺寸 |
123 | /主要尺寸 |
456 | /主要尺寸 |
456 | /主要尺寸 |
456 | /側隱藏 |
456 | /side-build |
456 | /六月事件 |
在這種情況下,session 456 符合我的條件,因為它訪問了兩個頁面都包含“main”和頁面包含“side”,但是我想知道通過這個 output 如何才能查詢到僅低於 Z78E6221F6393D1356D681DB18484
會話ID | 頁面路徑 |
---|---|
456 | /主要尺寸 |
456 | /主要尺寸 |
456 | /側隱藏 |
456 | /side-build |
456 | /六月事件 |
考慮下面的查詢。
SELECT * EXCEPT(path)
FROM sample_table, UNNEST([REGEXP_EXTRACT(Pagepath, r'(main|side)')]) path
QUALIFY COUNT(DISTINCT path) OVER (PARTITION BY sessionID) = 2
-- Query results
+-----+-----------+-------------+
| Row | sessionID | Pagepath |
+-----+-----------+-------------+
| 1 | 456 | /main-size |
| 2 | 456 | /main-size |
| 3 | 456 | /side-hide |
| 4 | 456 | /June-event |
| 5 | 456 | /side-build |
+-----+-----------+-------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.