繁体   English   中英

在 Snowflake 中根据日期和 Window function 过滤查询

[英]Filtering a Query based on a Date and Window function in Snowflake

我被要求提取有关去年三种不同类型客户的信息(访问过一次、访问过 <10 次和访问过 10 次以上),看看他们返回的可能性是否与几个不同的因素相比。

出于这个原因,我创建了一个非常广泛的查询。 目前我有三张表的联合查询:客户信息、访问信息和员工信息。 我在 select 语句中创建了一个计算列:

COUNT(DISTINCT visitno) OVER(PARTITION BY clientid) as totalvisits

现在我只需要按 totalvisits 分组并按他们访问的日期进行过滤。

我试过了:

where visitdate> 01/01/2021
group by totalvisits
having total visits<10

但是我收到一个错误,指出 visitno 不是一个有效的 group by expression。

我可能做错了什么?

在雪花中,您可以使用QUALIFY子句过滤 window 函数后 window 聚合。

因此,查询将如下所示:

SELECT
  clientid,
  COUNT(DISTINCT visitno) OVER(PARTITION BY clientid) as totalvisits
FROM <your_table>
WHERE visitdate >= 2021-01-01::date
  AND visitdate < 2022-01-01::date
QUALIFY totalvisits < 10;

*不过,请确保visitdate事先有一个日期类型!

[参考下面的评论] :如果您想查看历史总访问量,加上给定年份的总访问量,您可以执行以下操作:

SELECT
  clientid,
  YEAR(visitdate) as visit_date_year,
  COUNT(DISTINCT visitno) OVER (PARTITION BY clientid) as totalvisits,
  COUNT(DISTINCT visitno) OVER (PARTITION BY clientid, YEAR(visitdate) as total_visits_by_year
FROM <your_table>
QUALIFY total_visits_by_year < 10;

好的,让我们做一些假数据,然后做一些事情:

WITH fake_data(client_id, visit_date) as (
    SELECT * FROM VALUES
    -- this person has visted once
    (1, '2022-04-14'::date),
    -- this person has visited 3 timw in the year
    (3, '2022-04-13'::date),
    (3, '2022-03-13'::date),
    (3, '2022-02-13'::date),
    -- this person is a huge vistor, but 1 is outside the with in last year.
    (5, '2022-04-12'::date),
    (5, '2022-03-12'::date),
    (5, '2022-02-12'::date),
    (5, '2022-01-12'::date),
    (5, '2020-02-12'::date)
)
SELECT *,
    count(distinct visit_date) over (partition by client_id) as total_visits
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)

繁荣:

客户编号 VISIT_DATE TOTAL_VISITS 次
1个 2022-04-14 1个
3个 2022-04-13 3个
3个 2022-03-13 3个
3个 2022-02-13 3个
5个 2022-04-12 4个
5个 2022-03-12 4个
5个 2022-02-12 4个

现在将它们放入那些组/类别中。

SELECT *,
    count(distinct visit_date) over (partition by client_id) as total_visits,
    case 
        when total_visits = 1 then 1
        when total_visits <= 3 then 2
        when total_visits > 3 then 3
    end as group_id
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)

现在一些数学,我将把它包装到一个子选择中(但也将一些东西压入其中)

WITH fake_data(client_id, visit_date) as (
    SELECT * FROM VALUES
    -- this person has visted once
    (1, '2022-04-14'::date),
    -- this person has visited 3 timw in the year
    (3, '2022-04-13'::date),
    (3, '2022-04-11'::date),
    (3, '2022-04-09'::date),
    -- this person is a huge vistor, but 1 is outside the with in last year.
    (5, '2022-04-12'::date),
    (5, '2022-03-12'::date),
    (5, '2022-02-12'::date),
    (5, '2022-01-12'::date),
    (5, '2020-02-12'::date)
)
SELECT group_id
    ,count(distinct client_id) as count_of_group_members
    ,sum(total_visits) as sum_of_group_visit
    ,avg(visit_gap_in_days) as avg_group_day_diff
    ,stddev(visit_gap_in_days) as stddev_group_day_diff
FROM (
SELECT *,
    count(distinct visit_date) over (partition by client_id) as total_visits,
    case 
        when total_visits = 1 then 1
        when total_visits <= 3 then 2
        when total_visits > 3 then 3
    end as group_id,
    lag(visit_date) over (partition by client_id order by visit_date) as prior_visit_date,
    datediff('day', prior_visit_date, visit_date) as visit_gap_in_days
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
)
GROUP BY 1
ORDER BY 1
群组编号 COUNT_OF_GROUP_MEMBERS 个成员 SUM_OF_GROUP_VISIT AVG_GROUP_DAY_DIFF STDDEV_GROUP_DAY_DIFF
1个 1个 1个
2个 1个 9 2个 0
3个 1个 16 30 1.732050808

Wozers,访问总和是错误的,我已经总结了我的总和..

所以这里给定count(distinct visitno)我不能求和,因为它变成了总和,而且我不能做 count(*) 因为我们刚刚注意到有重复项(否则不需要 distinct )。 而且我假设您没有删除行,因为有一些“您需要的其他详细信息”

但无论如何。 这是关于 SQL 的伟大之处,你可以回答任何问题,但你必须知道问题,并了解数据,这样你才能知道哪些假设可以适用于你的数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM