简体   繁体   English

R根据前一行中的值删除行

[英]R delete rows based on values in previous rows

I am new to R and trying to delete rows based on values of previous rows. 我是R的新手,并尝试根据前一行的值删除行。 Sample data: 样本数据:

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500219  | 2016-05-23 09:00:00  | 0
500219  | 2016-05-02 19:00:00  | 0
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1
500220  | 2016-05-23 09:00:00  | 0
500220  | 2016-05-02 19:00:00  | 0

I would like to maintain only the rows before Value = 1 for each Cust_ID giving the result: 我只想为每个Cust_ID保留Value = 1之前的行,以提供结果:

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1

Any help would be appreciated! 任何帮助,将不胜感激!

Here is a split-apply-combine method that keeps any values that are 1 as well as the values before the first 1 for each customer. 这是一个拆分应用合并方法,该方法将为每个客户保留任何值为1以及前1个值之前的值。

# split data by customer ID
myList <- split(df, df$Cust_ID)
# loop through ID list, drop desired rows, rbind resulting list
dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               i[c(1:drop[1], drop[-1]),]}))

which returns 哪个返回

dfNew
         Cust_ID                   Date Value
500219.1  500219  2016-04-11 12:00:00       0
500219.2  500219  2016-04-12 16:00:00       0
500219.3  500219  2016-04-14 11:00:00       1
500219.4  500219  2016-04-15 12:00:00       1
500220.7  500220  2016-04-11 12:00:00       0
500220.8  500220  2016-04-14 11:00:00       1
500220.9  500220  2016-04-15 12:00:00       1

Note that this solution will not work if there are customer IDs that never have a value equal to 1. 请注意,如果有客户ID的值永远不等于1,则此解决方案将不起作用。


If you want to retain observations that never reach the 1 threshold, then use 如果要保留从未达到1阈值的观测值,请使用

dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               if(length(drop) != 0) i[c(1:drop[1], drop[-1]),]
                               else i}))

We can use data.table . 我们可以使用data.table Convert the 'data.frame' to 'data.table' ( setDT(df1) ), grouped by 'Cust_ID', we get the sequence of max of indexes where 'Value' is 1, and get the row index ( .I ) and use that to subset the data.table rows. 将'data.frame'转换为'data.table'( setDT(df1) ),按'Cust_ID'分组,我们得到'Value'为1的索引max序列,并获得行索引( .I )并使用它作为data.table行的子集。

library(data.table)
setDT(df1)[df1[,  if(any(Value == 1)) .I[seq(max(which(Value == 1)))]
                                 else .I[1:.N] , by = Cust_ID]$V1]
#      Cust_ID                Date Value
#1:  500219 2016-04-11 12:00:00     0
#2:  500219 2016-04-12 16:00:00     0
#3:  500219 2016-04-14 11:00:00     1
#4:  500219 2016-04-15 12:00:00     1
#5:  500220 2016-04-11 12:00:00     0
#6:  500220 2016-04-14 11:00:00     1
#7:  500220 2016-04-15 12:00:00     1

Or using a similar approach with dplyr 或使用与dplyr类似的方法

library(dplyr)
df1 %>% 
     group_by(Cust_ID) %>% 
     slice(if(any(Value==1)) seq(max(which(Value==1))) else row_number())
#   Cust_ID                Date Value
#     <int>               <chr> <int>
#1  500219 2016-04-11 12:00:00     0
#2  500219 2016-04-12 16:00:00     0
#3  500219 2016-04-14 11:00:00     1
#4  500219 2016-04-15 12:00:00     1
#5  500220 2016-04-11 12:00:00     0
#6  500220 2016-04-14 11:00:00     1
#7  500220 2016-04-15 12:00:00     1

Looping approach: 循环方法:

cust <- 0
keep <- FALSE
keepers <- vector(mode = "logical", length = nrow(df))

## walk through the dataframe backwards
for(rec in nrow(df):1)
{
  ## have we been working with this customer?
  if(df[rec,]$Cust_ID == cust)
  {
    if(df[rec,]$Value == 1  | keep == TRUE)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
  }
  else
  {
    cust = df[rec,]$Cust_ID
    if(df[rec,]$Value == 1)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
    else
    {
      keep <- FALSE
    }
  }
}

df <- df[keepers,]
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM