R：如何将数据框中的行分组，满足条件的ID行，然后删除该组的先前行？

Question

I have a dataframe of customers (identified by ID number), the number of units of two products they bought in each of four years, and a final column identifying the year in which new customers first purchased (the 'key' column). 我有一个客户数据框（由ID号标识），他们在四年中每年购买的两种产品的单位数，以及最后一列，用于标识新客户首次购买的年份（“关键”列）。 The problem: the dataframe includes rows from the years prior to new customers purchasing for the first time. 问题是：数据框包含新客户首次购买之前的年份中的行。 I need to delete these rows. 我需要删除这些行。 For example, this dataframe: 例如，此数据框：

   customer year item.A item.B  key
1         1 2000     NA     NA         <NA>
2         1 2001     NA     NA         <NA>
3         1 2002      1      5 new.customer
4         1 2003      2      6         <NA>
5         2 2000     NA     NA         <NA>
6         2 2001     NA     NA         <NA>
7         2 2002     NA     NA         <NA>
8         2 2003      2      7 new.customer
9         3 2000      2      4         <NA>
10        3 2001      6      4         <NA>
11        3 2002      2      5         <NA>
12        3 2003      1      8         <NA>

needs to look like this: 需要看起来像这样：

  customer year item.A item.B key
1        1 2002      1      5 new.customer
2        1 2003      2      6         <NA>
3        2 2003      2      7 new.customer
4        3 2000      2      4         <NA>
5        3 2001      6      4         <NA>
6        3 2002      2      5         <NA>
7        3 2003      1      8         <NA>

I thought I could do this using dplyr/tidyr - a combination of group, lead/lag, and slice (or perhaps filter and drop_na) but I can't figure out how to delete backwards in the customer group once I've identified the rows meeting the condition "key"=="new.customer". 我以为可以使用dplyr / tidyr做到这一点-组，超前/滞后和切片（或filter和drop_na）的组合，但是一旦确定了客户组，我就无法弄清楚如何向后删除满足条件“键” ==“ new.customer”的行。 Thanks for any suggestions (code for the full dataframe below). 感谢您的任何建议（以下完整数据框的代码）。

a<-c(1,1,1,1,2,2,2,2,3,3,3,3)
b<-c(2000,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003)
c<-c(NA,NA,1,2,NA,NA,NA,2,2,6,2,1)
d<-c(NA,NA,5,6,NA,NA,NA,7,4,4,5,8)
e<-c(NA,NA,"new",NA,NA,NA,NA,"new",NA,NA,NA,NA) 
df <- data.frame("customer" =a, "year" = b, "C" = c, "D" = d,"key"=e)
df

Answer 1

As a first step I am marking existing customers (customer 3 in this case) in the key column - 首先，我要在键列中标记现有客户（在这种情况下为客户3）-

df %>% 
  group_by(customer) %>% 
  mutate(
    key = as.character(key), # can be avoided if key is a character to begin with
    key = ifelse(row_number() == 1 & (!is.na(C) | !is.na(D)), "existing", key)
  ) %>% 
  filter(cumsum(!is.na(key)) > 0) %>% 
  ungroup()

# A tibble: 7 x 5
  customer  year     C     D key     
     <dbl> <dbl> <dbl> <dbl> <chr>   
1        1  2002     1     5 new     
2        1  2003     2     6 NA      
3        2  2003     2     7 new     
4        3  2000     2     4 existing
5        3  2001     6     4 NA      
6        3  2002     2     5 NA      
7        3  2003     1     8 NA

R：如何将数据框中的行分组，满足条件的ID行，然后删除该组的先前行？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-11-17 01:30:39

R：如何将数据框中的行分组，满足条件的ID行，然后删除该组的先前行？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-11-17 01:30:39

解决方案1
1 已采纳 2018-11-17 01:30:39