R清洁数据框的聪明方法

Question

I have a data frame with two columns, an index column, which indexes rows in a second data frame. 我有一个包含两列的数据框架，一个索引列，该索引对第二个数据框架中的行进行索引。 These rows all contain a specific event. 这些行都包含一个特定事件。 Which event this is, is coded in the second column, here named code_start_stop . 这是哪个事件，在第二列中编码，此处命名为code_start_stop 。

Example: 例：

index <- c(769, 766, 810, 813, 830, 842, 842, 892, 907, 944)
code_start_stop <- c(2006, 2001, 2004, 1001, 1004, 2001, 1001, 1006, 2004, 1004)
replace_all <- data.frame(index, code_start_stop)

Now there are pairs of start/stop codes, ie 2001 and 1001, 2002 and 1002, etc. The aim is that, in case there are rows that are enclosed by a start marker (ie here 2006) and the respective next stop marker (here 1006) , these rows should be removed from the data frame. 现在有成对的开始/停止代码，即2001和1001、2002和1002，等等。目的是， 如果存在被开始标记（即此处2006）和相应的下一个停止标记（在这里1006） ，这些行应从数据框中删除。 Note that there are always pairs of start & stop markers. 注意，总是有成对的开始和停止标记。

Any suggestions for a clever way how to do this are appreciated. 任何有关如何做到这一点的聪明方法的建议，我们将不胜感激。 Thanks! 谢谢！

Answer 1

Your question is a little confusing, please correct me if I got it wrong. 您的问题有点令人困惑，如果我弄错了，请纠正我。 The following should work: 以下应该工作：

startm <- 2006 #startmarker
endm   <- 1006 #endmarker

 #look for  row that contains markers
 index1 <- which(replace_all[,2]  == startm) 
 index2 <- which(replace_all[,2]  == endm)

 #subset accordingly
 replace_all <- replace_all[-(index1:index2),]

Note: This removes also the rows, containing the markers. 注意：这还将删除包含标记的行。 If you want to only remove rows between the markes add a +1/-1 at the subsetting step. 如果只想删除标记之间的行，请在子设置步骤中添加+ 1 / -1。

Answer 2

The solution is now based on maRtin's suggestion and seems to work pretty well. 现在，该解决方案基于maRtin的建议，并且效果很好。

I do the following going through all pairs of start and end markers: 我将对所有开始和结束标记进行以下操作：

to_delete <- c()
## Care = 2001/1001
startm1 <- 2001
endm1 <- 1001
index1 <- which((replace_all[,2]  == startm1))
index2 <- which((replace_all[,2]  == endm1))
if(length(index1) !=0){
  for (i in 1:length(index1)){
    if (index2[i]-index1[i]>1){
      to_delete <- c(to_delete, (((index1[i])+1):((index2[i])-1)))
    }
  }
}

... go through all other pairs of start/stop markers and then remove to_delete ...经历所有其他的开始/停止标记对，然后删除to_delete

if (length(to_delete) != 0){
    replace_all <- replace_all[-to_delete,]
  }
    replace_all <- replace_all[,1]
  }

R清洁数据框的聪明方法

问题描述

2 个解决方案

解决方案1
0 2016-04-08 17:40:16

解决方案2
0 2016-04-09 11:55:59

R清洁数据框的聪明方法

问题描述

2 个解决方案

解决方案1 0 2016-04-08 17:40:16

解决方案2 0 2016-04-09 11:55:59

解决方案1
0 2016-04-08 17:40:16

解决方案2
0 2016-04-09 11:55:59