如何按类别分隔 SQL 分区：“row_number() over (partition by...”？

Question

I am working with some app events data and looking to group the event sets of a specific action together, in order to grab the most recent event set.我正在处理一些应用程序事件数据，并希望将特定操作的事件集组合在一起，以获取最新的事件集。 The customer (customer_id) starts the event set with 'step 1' (EventStep) and can go all the way through step 4 (or can drop out at any step along the way).客户 (customer_id) 使用“步骤 1”(EventStep) 启动事件集，并且可以 go 一直到步骤 4（或者可以在此过程中的任何步骤退出）。 The event set can be triggered by a few actions (EventTrigger).事件集可以由几个动作（EventTrigger）触发。

Goal: Grab all the steps of the most recent event set, and identify the date (based on Timestamp) and EventTrigger.目标：获取最近事件集的所有步骤，并确定日期（基于时间戳）和 EventTrigger。

Issue问题

There should only be 1 EventTrigger for each event set but the way my code is written, it combined event steps from different EventTriggers ( if the customer advanced further along in previous attempts than in most recent attempts).每个事件集应该只有 1 个 EventTrigger，但我的代码编写方式，它结合了来自不同 EventTriggers 的事件步骤（如果客户在以前的尝试中比在最近的尝试中更进一步）。 How do I ensure the event steps are grouped by the EventTrigger?如何确保事件步骤按 EventTrigger 分组？

My code我的代码

SELECT * FROM (
    SELECT customer_id
         , EventStep
         , Timestamp
         , EventTrigger
         , ROW_NUMBER() OVER (PARTITION BY customer_id, EventStep ORDER BY Timestamp DESC) AS row_num
    FROM xxx_table
) xxx
WHERE row_num = 1

Actual Results实际结果

Image 1:图 1： 实际结果

Wanted Results想要的结果

Image 2图 2

The ID field is something I created that labels the events in the order that they happened so that you can visualize what I'm looking for better. ID字段是我创建的，它按照事件发生的顺序标记事件，以便您可以更好地可视化我正在寻找的内容。

Answer 1

I think you want:我想你想要：

SELECT xxx.*
FROM (SELECT xxx.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id, EventStep ORDER BY Timestamp DESC) AS seqnum
      FROM xxx_table xxx
     ) xxx
WHERE seqnum = 1;

Nothing in your question suggests that aggregation is necessary.您的问题中没有任何内容表明聚合是必要的。

EDIT:编辑：

Are you just looking for dense_rank() :你只是在寻找dense_rank() ：

select xxx.*,
       dense_rank() over (partition by customer_id order by Timestamp) as seqnum
from xxx_table xxx;

Answer 2

To paraphrase, you want the most recent events for each customer, stopping at the most recent Step1?换句话说，您想要每个客户的最新事件，在最近的 Step1 处停止？

SELECT
  xxx_table.*
FROM
  xxx_table
INNER JOIN
(
  SELECT customer_id, MIN(timestamp) AS timestamp
    FROM xxx_table
   WHERE EventStep = 'Step 1'
)
  AS cust_endpoint
     ON  cust_endpoint.customer_id  = xxx_table.customer_id
     AND cust_endpoint.timestamp   >= xxx_table.timestamp

如何按类别分隔 SQL 分区：“row_number() over (partition by...”？

问题描述

Issue问题

My code我的代码

Actual Results实际结果

Wanted Results想要的结果

2 个解决方案

解决方案1
0 2020-04-06 13:53:01

解决方案2
0 2020-04-06 14:11:05

如何按类别分隔 SQL 分区：“row_number() over (partition by...”？

问题描述

Issue问题

My code我的代码

Actual Results实际结果

Wanted Results想要的结果

2 个解决方案

解决方案1 0 2020-04-06 13:53:01

解决方案2 0 2020-04-06 14:11:05

解决方案1
0 2020-04-06 13:53:01

解决方案2
0 2020-04-06 14:11:05