R根据前一行中的值删除行

Question

I am new to R and trying to delete rows based on values of previous rows. 我是R的新手，并尝试根据前一行的值删除行。 Sample data: 样本数据：

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500219  | 2016-05-23 09:00:00  | 0
500219  | 2016-05-02 19:00:00  | 0
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1
500220  | 2016-05-23 09:00:00  | 0
500220  | 2016-05-02 19:00:00  | 0

I would like to maintain only the rows before Value = 1 for each Cust_ID giving the result: 我只想为每个Cust_ID保留Value = 1之前的行，以提供结果：

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1

Any help would be appreciated! 任何帮助，将不胜感激！

Answer 1

Here is a split-apply-combine method that keeps any values that are 1 as well as the values before the first 1 for each customer. 这是一个拆分应用合并方法，该方法将为每个客户保留任何值为1以及前1个值之前的值。

# split data by customer ID
myList <- split(df, df$Cust_ID)
# loop through ID list, drop desired rows, rbind resulting list
dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               i[c(1:drop[1], drop[-1]),]}))

which returns 哪个返回

dfNew
         Cust_ID                   Date Value
500219.1  500219  2016-04-11 12:00:00       0
500219.2  500219  2016-04-12 16:00:00       0
500219.3  500219  2016-04-14 11:00:00       1
500219.4  500219  2016-04-15 12:00:00       1
500220.7  500220  2016-04-11 12:00:00       0
500220.8  500220  2016-04-14 11:00:00       1
500220.9  500220  2016-04-15 12:00:00       1

Note that this solution will not work if there are customer IDs that never have a value equal to 1. 请注意，如果有客户ID的值永远不等于1，则此解决方案将不起作用。

If you want to retain observations that never reach the 1 threshold, then use 如果要保留从未达到1阈值的观测值，请使用

dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               if(length(drop) != 0) i[c(1:drop[1], drop[-1]),]
                               else i}))

Answer 2

We can use data.table . 我们可以使用data.table 。 Convert the 'data.frame' to 'data.table' ( setDT(df1) ), grouped by 'Cust_ID', we get the sequence of max of indexes where 'Value' is 1, and get the row index ( .I ) and use that to subset the data.table rows. 将'data.frame'转换为'data.table'（ setDT(df1) ），按'Cust_ID'分组，我们得到'Value'为1的索引max序列，并获得行索引（ .I ）并使用它作为data.table行的子集。

library(data.table)
setDT(df1)[df1[,  if(any(Value == 1)) .I[seq(max(which(Value == 1)))]
                                 else .I[1:.N] , by = Cust_ID]$V1]
#      Cust_ID                Date Value
#1:  500219 2016-04-11 12:00:00     0
#2:  500219 2016-04-12 16:00:00     0
#3:  500219 2016-04-14 11:00:00     1
#4:  500219 2016-04-15 12:00:00     1
#5:  500220 2016-04-11 12:00:00     0
#6:  500220 2016-04-14 11:00:00     1
#7:  500220 2016-04-15 12:00:00     1

Or using a similar approach with dplyr 或使用与dplyr类似的方法

library(dplyr)
df1 %>% 
     group_by(Cust_ID) %>% 
     slice(if(any(Value==1)) seq(max(which(Value==1))) else row_number())
#   Cust_ID                Date Value
#     <int>               <chr> <int>
#1  500219 2016-04-11 12:00:00     0
#2  500219 2016-04-12 16:00:00     0
#3  500219 2016-04-14 11:00:00     1
#4  500219 2016-04-15 12:00:00     1
#5  500220 2016-04-11 12:00:00     0
#6  500220 2016-04-14 11:00:00     1
#7  500220 2016-04-15 12:00:00     1

Answer 3

Looping approach: 循环方法：

cust <- 0
keep <- FALSE
keepers <- vector(mode = "logical", length = nrow(df))

## walk through the dataframe backwards
for(rec in nrow(df):1)
{
  ## have we been working with this customer?
  if(df[rec,]$Cust_ID == cust)
  {
    if(df[rec,]$Value == 1  | keep == TRUE)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
  }
  else
  {
    cust = df[rec,]$Cust_ID
    if(df[rec,]$Value == 1)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
    else
    {
      keep <- FALSE
    }
  }
}

df <- df[keepers,]
df

R根据前一行中的值删除行

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-07-29 15:47:26

解决方案2
2 2016-07-29 16:01:38

解决方案3
0 2016-07-29 16:59:06

R根据前一行中的值删除行

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-07-29 15:47:26

解决方案2 2 2016-07-29 16:01:38

解决方案3 0 2016-07-29 16:59:06

解决方案1
2 已采纳 2016-07-29 15:47:26

解决方案2
2 2016-07-29 16:01:38

解决方案3
0 2016-07-29 16:59:06