[英]Update value of one record with the value of another record in the same table
[英]Update value based on value from another record of same table
在這里,我有一個網站訪問者的示例表。 正如我們所見,有時訪問者不提供他們的電子郵件。 此外,他們可能會在一段時間內切換到不同的電子郵件地址。
**
**
我想根據以下要求更新此表:
**
**
我想知道是否有辦法在 Redshift 或 T-Sql 中做到這一點?
謝謝大家!
如果我們假設表的名稱是Visits
並且該表的主鍵由列Visitor_id
和Activity_Date
那么您可以在 T-SQL 中執行以下操作:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
對於每個visitor_id,您可以使用以前的非空值更新空電子郵件值。 如果沒有,您將使用下一個非空值。您可以按如下方式獲取這些值:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null
在 SQL Server 或 Redshift 中,您可以使用子查詢來計算電子郵件:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
然后您可以在更新中使用它:
更新 t set emai = tt.imputed_email from (select t. ,coalesce(email, max(email) over (partition byvisitor_id, grp), max(case when activity_date = first_email_date then email end) over (partition byvisitor_id) ) as imputed_email from (select t. , min(case when email is not null then activity_date end) over
(partition byvisitor_id order by activity_date) 作為 first_email_date,count(email) over (partition byvisitor_id order by activity_date) 作為 grp from t ) t ) tt 其中 tt.visitor_id = t.visitor_id 和 tt.activity_date = t.activity_date 和 t .email 為空;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.