根据分组变量的更改，使用dplyr生成订单等级列

Question

I am having a little challenge with dplyr on generating a rank column on a tbl_df object from a log of transactions for a particular consumer. 我在使用dplyr时遇到了一点挑战，那就是要根据特定消费者的交易日志在tbl_df对象上生成等级列。 The data i have look like this: 我的数据看起来像这样：

                                        consumerid merchant_id      eventtimestamp merchant_visit_rank
                                              (chr)       (int)              (time)          (dbl)
            1  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-15 13:33:00              0
            2  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:03              1
            3  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:41              0
            4  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:05              1
            5  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:55              1
            6  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 14:15:56              0
            7  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:18              1
            8  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:19              0
            9  004a5cc3-3d60-4d14-85b3-706e454aae13          54 2015-01-21 13:52:24              0
            10 004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:29              0
            ..                                  ...         ...                 ...            ...

I want to generate a merchant visit rank so it tells me the order of this merchant during this transaction session. 我想生成一个商家访问等级，以便在交易期间告诉我该商家的订单。 In our case the correct ranking would look : 在我们的情况下，正确的排名将如下所示：

                                        consumerid merchant_id      eventtimestamp merchant_visit_rank
                                              (chr)       (int)              (time)          (dbl)
            1  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-15 13:33:00              1
            2  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:03              2
            3  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:41              2
            4  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:05              3
            5  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:55              3
            6  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 14:15:56              3
            7  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:18              4
            8  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:19              4
            9  004a5cc3-3d60-4d14-85b3-706e454aae13          54 2015-01-21 13:52:24              5
            10 004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:29              6
            ..                                  ...         ...                 ...            ...

I have tried to play with the window functions in dplyr like this : 我试图像这样在dplyr中使用window函数：

            measure_media_interaction %>% 
              #selecting the fields we wish from the dataframe
              select(consumerid,merchant_id,eventtimestamp) %>%
              #mutate a placeholder column to be used for the rank 
              mutate(merchant_visit = 0) %>% 
              #sort them by consumer and timestamp
              arrange(consumerid,eventtimestamp) %>%
              #change the column so it shows that this merchant was the first this consumer visited 
              #or not 
              mutate(merchant_visit = 
                       ifelse(lead(merchant_id)!=merchant_id,merchant_visit,merchant_visit+1))

However I am stuck and i don't know how to do it efficiently. 但是我被困住了，我不知道如何有效地做到这一点。 Any ideas on this ? 有什么想法吗？

Answer 1

Here is a solution. 这是一个解决方案。 We use lag to test whether merchant_id changes and cumsum to increment the counter. 我们使用lag来测试merchant_id是否更改，并使用cumsum来增加计数器。

measure_media_interaction %>% 
  select(consumerid,merchant_id,eventtimestamp) %>%
  arrange(consumerid,eventtimestamp) %>%
  mutate(merchant_visit=cumsum(c(1,(merchant_id != lag(merchant_id))[-1])))

根据分组变量的更改，使用dplyr生成订单等级列

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-01-22 20:10:15

根据分组变量的更改，使用dplyr生成订单等级列

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-01-22 20:10:15

解决方案1
0 已采纳 2016-01-22 20:10:15