[英]R: How can I group rows in a dataframe, ID rows meeting a condition, then delete prior rows for the group?
I have a dataframe of customers (identified by ID number), the number of units of two products they bought in each of four years, and a final column identifying the year in which new customers first purchased (the 'key' column). 我有一个客户数据框(由ID号标识),他们在四年中每年购买的两种产品的单位数,以及最后一列,用于标识新客户首次购买的年份(“关键”列)。 The problem: the dataframe includes rows from the years prior to new customers purchasing for the first time. 问题是:数据框包含新客户首次购买之前的年份中的行。 I need to delete these rows. 我需要删除这些行。 For example, this dataframe: 例如,此数据框:
customer year item.A item.B key
1 1 2000 NA NA <NA>
2 1 2001 NA NA <NA>
3 1 2002 1 5 new.customer
4 1 2003 2 6 <NA>
5 2 2000 NA NA <NA>
6 2 2001 NA NA <NA>
7 2 2002 NA NA <NA>
8 2 2003 2 7 new.customer
9 3 2000 2 4 <NA>
10 3 2001 6 4 <NA>
11 3 2002 2 5 <NA>
12 3 2003 1 8 <NA>
needs to look like this: 需要看起来像这样:
customer year item.A item.B key
1 1 2002 1 5 new.customer
2 1 2003 2 6 <NA>
3 2 2003 2 7 new.customer
4 3 2000 2 4 <NA>
5 3 2001 6 4 <NA>
6 3 2002 2 5 <NA>
7 3 2003 1 8 <NA>
I thought I could do this using dplyr/tidyr - a combination of group, lead/lag, and slice (or perhaps filter and drop_na) but I can't figure out how to delete backwards in the customer group once I've identified the rows meeting the condition "key"=="new.customer". 我以为可以使用dplyr / tidyr做到这一点-组,超前/滞后和切片(或filter和drop_na)的组合,但是一旦确定了客户组,我就无法弄清楚如何向后删除满足条件“键” ==“ new.customer”的行。 Thanks for any suggestions (code for the full dataframe below). 感谢您的任何建议(以下完整数据框的代码)。
a<-c(1,1,1,1,2,2,2,2,3,3,3,3)
b<-c(2000,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003)
c<-c(NA,NA,1,2,NA,NA,NA,2,2,6,2,1)
d<-c(NA,NA,5,6,NA,NA,NA,7,4,4,5,8)
e<-c(NA,NA,"new",NA,NA,NA,NA,"new",NA,NA,NA,NA)
df <- data.frame("customer" =a, "year" = b, "C" = c, "D" = d,"key"=e)
df
As a first step I am marking existing customers (customer 3 in this case) in the key column - 首先,我要在键列中标记现有客户(在这种情况下为客户3)-
df %>%
group_by(customer) %>%
mutate(
key = as.character(key), # can be avoided if key is a character to begin with
key = ifelse(row_number() == 1 & (!is.na(C) | !is.na(D)), "existing", key)
) %>%
filter(cumsum(!is.na(key)) > 0) %>%
ungroup()
# A tibble: 7 x 5
customer year C D key
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 2002 1 5 new
2 1 2003 2 6 NA
3 2 2003 2 7 new
4 3 2000 2 4 existing
5 3 2001 6 4 NA
6 3 2002 2 5 NA
7 3 2003 1 8 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.