[英]How do I use lag function to skip a row? PostgreSQL 9.3
我目前正在嘗試從表中列出一個查詢用戶上網行為的查詢。 該表如下圖所示
**RecordID RespondentID DeviceID UTCTimestamp Domain**
1 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 2 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 4 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 6 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 7 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us 8 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us
多虧了大家的幫助,我才得以做到。
RecordID RespondentID UTCTimestamp源域到域RecordID
2 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 4 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 6 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us 7
“到域”是域名不同的下一行的值。
問題雖然看起來正確,但實際上我們跳過了整個第一條記錄。 這是因為,在給定數據集的情況下,第一行“域”被連接到第二行“域”,而我們跳過了它。 第2行與第3行結合在一起,因此第一個結果記錄顯示的是RecordID2。我想對此做進一步的調整。 由於域相同,因此我的結果應從RecordID 1開始並跳過RecordID 2,因此結果應顯示為
RecordID RespondentID UTCTimestamp源域到域
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us
我嘗試跳過RecordID 2,但是,遇到SQL錯誤'prev_nane'。
SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1 And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;
我做錯了什么?
我最終要達到的最終結果也要低於RecordID RespondentID UTCTimestamp源域到域的最終目標
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com samplicio.us 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com samplicio.us 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us samplicio.us
另一列稱為“最終目的地”。 這是為了讓我將3個事務分組在一起,作為到達samplicio.us的路徑。
提前致謝。
嘗試這個:
with t1 as
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
row_number() over (partition by
respondentid,
deviceid
order by
utctimestamp,
recordid) as user_seq,
row_number() over (partition by
respondentid,
deviceid,
domain
order by
utctimestamp,
recordid) as user_domain_seq
from traffic_mobile)
select *
from
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
lead(domain) over ( partition by
respondentid,
deviceid order by
user_seq) as next_domain,
last_value(domain) over( partition by
respondentid,
deviceid order by user_seq
rows between unbounded preceding
and unbounded following )
as final_domain
from t1
where
user_domain_seq = 1 ) t2
where t2.next_domain is not null
sqlfiddle: sqlfiddle.com/#! 17/ dc248/3
PS。 對於僅在traffic_mobile表上具有1個條目的用戶,該查詢將不會返回行。 如果需要,將需要改進查詢以包括它們。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.