简体   繁体   English

如何按类别分隔 SQL 分区:“row_number() over (partition by...”?

[英]How do I separate a SQL partition by category: “row_number() over (partition by…”?

I am working with some app events data and looking to group the event sets of a specific action together, in order to grab the most recent event set.我正在处理一些应用程序事件数据,并希望将特定操作的事件集组合在一起,以获取最新的事件集。 The customer (customer_id) starts the event set with 'step 1' (EventStep) and can go all the way through step 4 (or can drop out at any step along the way).客户 (customer_id) 使用“步骤 1”(EventStep) 启动事件集,并且可以 go 一直到步骤 4(或者可以在此过程中的任何步骤退出)。 The event set can be triggered by a few actions (EventTrigger).事件集可以由几个动作(EventTrigger)触发。

Goal: Grab all the steps of the most recent event set, and identify the date (based on Timestamp) and EventTrigger.目标:获取最近事件集的所有步骤,并确定日期(基于时间戳)和 EventTrigger。

Issue问题

There should only be 1 EventTrigger for each event set but the way my code is written, it combined event steps from different EventTriggers ( if the customer advanced further along in previous attempts than in most recent attempts).每个事件集应该只有 1 个 EventTrigger,但我的代码编写方式,它结合了来自不同 EventTriggers 的事件步骤(如果客户在以前的尝试中比在最近的尝试中更进一步)。 How do I ensure the event steps are grouped by the EventTrigger?如何确保事件步骤按 EventTrigger 分组?

My code我的代码

SELECT * FROM (
    SELECT customer_id
         , EventStep
         , Timestamp
         , EventTrigger
         , ROW_NUMBER() OVER (PARTITION BY customer_id, EventStep ORDER BY Timestamp DESC) AS row_num
    FROM xxx_table
) xxx
WHERE row_num = 1

Actual Results实际结果

Image 1:图 1:实际结果

Wanted Results想要的结果

Image 2图 2通缉

The ID field is something I created that labels the events in the order that they happened so that you can visualize what I'm looking for better. ID字段是我创建的,它按照事件发生的顺序标记事件,以便您可以更好地可视化我正在寻找的内容。

I think you want:你想要:

SELECT xxx.*
FROM (SELECT xxx.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id, EventStep ORDER BY Timestamp DESC) AS seqnum
      FROM xxx_table xxx
     ) xxx
WHERE seqnum = 1;

Nothing in your question suggests that aggregation is necessary.您的问题中没有任何内容表明聚合是必要的。

EDIT:编辑:

Are you just looking for dense_rank() :你只是在寻找dense_rank()

select xxx.*,
       dense_rank() over (partition by customer_id order by Timestamp) as seqnum
from xxx_table xxx;

To paraphrase, you want the most recent events for each customer, stopping at the most recent Step1?换句话说,您想要每个客户的最新事件,在最近的 Step1 处停止?

SELECT
  xxx_table.*
FROM
  xxx_table
INNER JOIN
(
  SELECT customer_id, MIN(timestamp) AS timestamp
    FROM xxx_table
   WHERE EventStep = 'Step 1'
)
  AS cust_endpoint
     ON  cust_endpoint.customer_id  = xxx_table.customer_id
     AND cust_endpoint.timestamp   >= xxx_table.timestamp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM