[英]Update value based on value from another record of same table
Here I have a sample table of a website visitors.在这里,我有一个网站访问者的示例表。 As we can see, sometimes visitor don't provide their email.正如我们所见,有时访问者不提供他们的电子邮件。 Also they may switch to different email addresses over period.此外,他们可能会在一段时间内切换到不同的电子邮件地址。
** **
** **
I want to update this table with following requirements:我想根据以下要求更新此表:
** **
** **
I was wondering if there is a way of doing it in Redshift or T-Sql?我想知道是否有办法在 Redshift 或 T-Sql 中做到这一点?
Thanks everyone!谢谢大家!
If we suppose that the name of the table is Visits
and the primary key of that table is made of the columns Visitor_id
and Activity_Date
then you can do in T-SQL following:如果我们假设表的名称是Visits
并且该表的主键由列Visitor_id
和Activity_Date
那么您可以在 T-SQL 中执行以下操作:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
For each visitor_id you can update the null email value with the previus non-null value.对于每个visitor_id,您可以使用以前的非空值更新空电子邮件值。 In case there is none, you will use the next non-null value.You can get those values as follows:如果没有,您将使用下一个非空值。您可以按如下方式获取这些值:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null
In SQL Server or Redshift, you can use a subquery to calculate the email:在 SQL Server 或 Redshift 中,您可以使用子查询来计算电子邮件:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
You can then use this in an update:然后您可以在更新中使用它:
update t set emai = tt.imputed_email from (select t. , coalesce(email, max(email) over (partition by visitor_id, grp), max(case when activity_date = first_email_date then email end) over (partition by visitor_id) ) as imputed_email from (select t. , min(case when email is not null then activity_date end) over更新 t set emai = tt.imputed_email from (select t. ,coalesce(email, max(email) over (partition byvisitor_id, grp), max(case when activity_date = first_email_date then email end) over (partition byvisitor_id) ) as imputed_email from (select t. , min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date) as first_email_date, count(email) over (partition by visitor_id order by activity_date) as grp from t ) t ) tt where tt.visitor_id = t.visitor_id and tt.activity_date = t.activity_date and t.email is null; (partition byvisitor_id order by activity_date) 作为 first_email_date,count(email) over (partition byvisitor_id order by activity_date) 作为 grp from t ) t ) tt 其中 tt.visitor_id = t.visitor_id 和 tt.activity_date = t.activity_date 和 t .email 为空;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.