[英]Filtering a Query based on a Date and Window function in Snowflake
我被要求提取有关去年三种不同类型客户的信息(访问过一次、访问过 <10 次和访问过 10 次以上),看看他们返回的可能性是否与几个不同的因素相比。
出于这个原因,我创建了一个非常广泛的查询。 目前我有三张表的联合查询:客户信息、访问信息和员工信息。 我在 select 语句中创建了一个计算列:
COUNT(DISTINCT visitno) OVER(PARTITION BY clientid) as totalvisits
现在我只需要按 totalvisits 分组并按他们访问的日期进行过滤。
我试过了:
where visitdate> 01/01/2021
group by totalvisits
having total visits<10
但是我收到一个错误,指出 visitno 不是一个有效的 group by expression。
我可能做错了什么?
在雪花中,您可以使用QUALIFY子句过滤 window 函数后 window 聚合。
因此,查询将如下所示:
SELECT
clientid,
COUNT(DISTINCT visitno) OVER(PARTITION BY clientid) as totalvisits
FROM <your_table>
WHERE visitdate >= 2021-01-01::date
AND visitdate < 2022-01-01::date
QUALIFY totalvisits < 10;
*不过,请确保visitdate
事先有一个日期类型!
[参考下面的评论] :如果您想查看历史总访问量,加上给定年份的总访问量,您可以执行以下操作:
SELECT
clientid,
YEAR(visitdate) as visit_date_year,
COUNT(DISTINCT visitno) OVER (PARTITION BY clientid) as totalvisits,
COUNT(DISTINCT visitno) OVER (PARTITION BY clientid, YEAR(visitdate) as total_visits_by_year
FROM <your_table>
QUALIFY total_visits_by_year < 10;
好的,让我们做一些假数据,然后做一些事情:
WITH fake_data(client_id, visit_date) as (
SELECT * FROM VALUES
-- this person has visted once
(1, '2022-04-14'::date),
-- this person has visited 3 timw in the year
(3, '2022-04-13'::date),
(3, '2022-03-13'::date),
(3, '2022-02-13'::date),
-- this person is a huge vistor, but 1 is outside the with in last year.
(5, '2022-04-12'::date),
(5, '2022-03-12'::date),
(5, '2022-02-12'::date),
(5, '2022-01-12'::date),
(5, '2020-02-12'::date)
)
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
繁荣:
客户编号 | VISIT_DATE | TOTAL_VISITS 次 |
---|---|---|
1个 | 2022-04-14 | 1个 |
3个 | 2022-04-13 | 3个 |
3个 | 2022-03-13 | 3个 |
3个 | 2022-02-13 | 3个 |
5个 | 2022-04-12 | 4个 |
5个 | 2022-03-12 | 4个 |
5个 | 2022-02-12 | 4个 |
现在将它们放入那些组/类别中。
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits,
case
when total_visits = 1 then 1
when total_visits <= 3 then 2
when total_visits > 3 then 3
end as group_id
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
现在一些数学,我将把它包装到一个子选择中(但也将一些东西压入其中)
WITH fake_data(client_id, visit_date) as (
SELECT * FROM VALUES
-- this person has visted once
(1, '2022-04-14'::date),
-- this person has visited 3 timw in the year
(3, '2022-04-13'::date),
(3, '2022-04-11'::date),
(3, '2022-04-09'::date),
-- this person is a huge vistor, but 1 is outside the with in last year.
(5, '2022-04-12'::date),
(5, '2022-03-12'::date),
(5, '2022-02-12'::date),
(5, '2022-01-12'::date),
(5, '2020-02-12'::date)
)
SELECT group_id
,count(distinct client_id) as count_of_group_members
,sum(total_visits) as sum_of_group_visit
,avg(visit_gap_in_days) as avg_group_day_diff
,stddev(visit_gap_in_days) as stddev_group_day_diff
FROM (
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits,
case
when total_visits = 1 then 1
when total_visits <= 3 then 2
when total_visits > 3 then 3
end as group_id,
lag(visit_date) over (partition by client_id order by visit_date) as prior_visit_date,
datediff('day', prior_visit_date, visit_date) as visit_gap_in_days
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
)
GROUP BY 1
ORDER BY 1
群组编号 | COUNT_OF_GROUP_MEMBERS 个成员 | SUM_OF_GROUP_VISIT | AVG_GROUP_DAY_DIFF | STDDEV_GROUP_DAY_DIFF |
---|---|---|---|---|
1个 | 1个 | 1个 | ||
2个 | 1个 | 9 | 2个 | 0 |
3个 | 1个 | 16 | 30 | 1.732050808 |
Wozers,访问总和是错误的,我已经总结了我的总和..
所以这里给定count(distinct visitno)
我不能求和,因为它变成了总和,而且我不能做 count(*) 因为我们刚刚注意到有重复项(否则不需要 distinct )。 而且我假设您没有删除行,因为有一些“您需要的其他详细信息”
但无论如何。 这是关于 SQL 的伟大之处,你可以回答任何问题,但你必须知道问题,并了解数据,这样你才能知道哪些假设可以适用于你的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.