繁体   English   中英

SQL多查询

[英]SQL multi query

我需要一些帮助才能在一个查询中正确完成(如果可能的话)。 (这是一个理论示例,我假设 event_name 中存在事件(例如注册/操作等)

我有 3 列:

-user_id
-event_timestamp
-event_name

从这 3 列中,我们需要创建具有 4 个新列的新表:

-user year and month registration time
-number of new user registration in this month
-number of users who returned to the second calendar month after registration
-return probability

结果必须是这样的:

2019-1 | 1 | 1 | 100%
2019-2 | 3 | 2 | 67%
2019-3 | 2 | 0 | 0%

我现在所做的:我正在使用我可能的主表的这个玩具示例:

CREATE TABLE `main` (
  `event_timestamp` timestamp,
  `user_id` int(10),
  `event_name` char(12)
) DEFAULT CHARSET=utf8;
INSERT INTO `main` (`event_timestamp`, `user_id`, `event_name`) VALUES
  ('2019-01-23 20:02:21.550', '1', 'registration'),
  ('2019-01-24 20:03:21.550', '2', 'action'),
  ('2019-02-21 20:04:21.550', '3', 'registration'),
  ('2019-02-22 20:05:21.550', '4', 'registration'),
  ('2019-02-23 20:06:21.550', '5', 'registration'),
  ('2019-02-23 20:06:21.550', '1', 'action'),
  ('2019-02-24 20:07:21.550', '6', 'action'),
  ('2019-03-20 20:08:21.550', '3', 'action'),
  ('2019-03-21 20:09:21.550', '4', 'action'),
  ('2019-03-22 20:10:21.550', '9', 'action'),
  ('2019-03-23 20:11:21.550', '10', 'registration'),
  ('2019-03-22 20:10:21.550', '4', 'action'),
  ('2019-03-22 20:10:21.550', '5', 'action'),
  ('2019-03-24 20:11:21.550', '11', 'registration');

我正在尝试测试一些查询以创建 4 个新列:

这是第 1 列,我们从时间戳中选择月份和年份,其中操作是注册(我猜),但我需要对月份进行总结(例如 2019-11、2019-12)

SELECT DATE_FORMAT(event_timestamp, '%Y-%m') AS column_1 FROM main
WHERE event_name='registration';

对于第 2 列,我们需要对每个月在本月注册 even_name 的用户求和,或者……我们可以尝试通过 user_id 搜索首次活动,但我不知道如何执行此操作。

这里有一些想法......

SELECT COUNT(DISTINCT user_id) AS user_count
FROM main
GROUP BY MONTH(event_timestamp);
SELECT COUNT(DISTINCT user_id) AS user_count FROM main
WHERE event_name='registration';

对于第 3 列,我们需要将user_idevent_name registration以及上个月的事件与第二个月的任何事件进行比较,以便我们获得下个月返回的用户。

知道如何创建此查询吗?

这是计算列 #4 的方法

SELECT *,
ROUND ((column_3/column_2)*100) AS column_4
FROM main;

我希望您会发现以下答案有帮助。

第一列是年份和月份的提取。 当操作为“注册”时, new_users列是唯一用户 ID 的COUNT ,因为在下个月采取多项操作后,用户可以从JOIN复制。 returned_users列是注册后下个月有操作的用户数。 returned_users列需要一个DISTINCT子句,因为一个用户在一个月内可以有多个操作。 最后一列是您从前两列中询问的概率。

JOIN子句是一种自加入,用于在注册后的下个月至少进行一次操作的用户。

SELECT CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp)),
  COUNT(DISTINCT(CASE WHEN  A.event_name LIKE 'registration' THEN A.user_id END)) AS new_users,
  COUNT(DISTINCT B.user_id) AS returned_users,
  CASE WHEN COUNT(DISTINCT(CASE WHEN  A.event_name LIKE 'registration' THEN A.user_id END))=0 THEN 0 ELSE COUNT(DISTINCT B.user_id)/COUNT(DISTINCT(CASE WHEN  A.event_name LIKE 'registration' THEN A.user_id END))*100 END AS My_Ratio
FROM main AS A
LEFT JOIN main AS B
ON A.user_id=B.user_id AND MONTH(A.event_timestamp)+1=MONTH(B.event_timestamp)
  AND A.event_name='registration' AND B.event_name='action'
GROUP BY CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp))

我们要做的是使用窗口函数和聚合——窗口函数来获取最早的注册日期。 然后是一些条件聚合。

一项挑战是日历月的处理。 为了解决这个问题,我们将把日期截断到月初以方便日期算术:

select yyyymm_reg, count(*) as regs_in_month,
       sum( month_2 > 0 ) as visits_2months,
       avg( month_2 > 0 ) as return_rate_2months
from (select m.user_id, m.yyyymm_reg,
             max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 1) ) as month_1,
             max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 2) ) as month_2,
             max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 3) ) as month_3
      from (select m.*,
                   cast(concat(extract(year_month from event_timestamp), '01') as date) as yyyymm,
                   cast(concat(extract(year_month from min(case when event_name = 'registration' then event_timestamp end) over (partition by user_id)), '01') as date) as yyyymm_reg
            from main m
           ) m
      where m.yyyymm_reg is not null
      group by m.user_id, m.yyyymm_reg
     ) u
group by u.yyyymm_reg;

是一个 db<>fiddle。

给你,在 T-SQL 中完成:

;with cte as(

select a.*  from (
select form,user_id,sum(count_regs) as count_regs,sum(count_action) as count_action from (
select FORMAT(event_timestamp,'yyyy-MM') as form,user_id,event_name,
CASE WHEN event_name = 'registration' THEN 1 ELSE 0 END as count_regs,
CASE WHEN event_name = 'action' THEN 1 ELSE 0 END as count_action from main) a 
group by form,user_id) a)





select final.form,final.count_regs,final.count_action,((CAST(final.count_action as float)/(CASE WHEN final.count_regs = '0' THEN '1' ELSE final.count_regs END))*100) as probability  from (
select a.form,sum(a.count_regs) count_regs,CASE WHEN sum(b.count_action) is null then '0' else sum(b.count_action) end count_action from cte a 
left join 
cte b 
ON a.user_id = b.user_id and 
DATEADD(month,1,CONVERT(date,a.form+'-01')) = CONVERT(date,b.form+'-01') 
group by a.form ) final where final.count_regs != '0' or final.count_action != '0'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM