简体   繁体   English

R清洁数据框的聪明方法

[英]R Clever way to clean data frame

I have a data frame with two columns, an index column, which indexes rows in a second data frame. 我有一个包含两列的数据框架,一个索引列,该索引对第二个数据框架中的行进行索引。 These rows all contain a specific event. 这些行都包含一个特定事件。 Which event this is, is coded in the second column, here named code_start_stop . 这是哪个事件,在第二列中编码,此处命名为code_start_stop

Example: 例:

index <- c(769, 766, 810, 813, 830, 842, 842, 892, 907, 944)
code_start_stop <- c(2006, 2001, 2004, 1001, 1004, 2001, 1001, 1006, 2004, 1004)
replace_all <- data.frame(index, code_start_stop)

Now there are pairs of start/stop codes, ie 2001 and 1001, 2002 and 1002, etc. The aim is that, in case there are rows that are enclosed by a start marker (ie here 2006) and the respective next stop marker (here 1006) , these rows should be removed from the data frame. 现在有成对的开始/停止代码,即2001和1001、2002和1002,等等。目的是, 如果存在被开始标记(即此处2006)和相应的下一个停止标记(在这里1006) ,这些行应从数据框中删除。 Note that there are always pairs of start & stop markers. 注意,总是有成对的开始和停止标记。

Any suggestions for a clever way how to do this are appreciated. 任何有关如何做到这一点的聪明方法的建议,我们将不胜感激。 Thanks! 谢谢!

Your question is a little confusing, please correct me if I got it wrong. 您的问题有点令人困惑,如果我弄错了,请纠正我。 The following should work: 以下应该工作:

startm <- 2006 #startmarker
endm   <- 1006 #endmarker

 #look for  row that contains markers
 index1 <- which(replace_all[,2]  == startm) 
 index2 <- which(replace_all[,2]  == endm)

 #subset accordingly
 replace_all <- replace_all[-(index1:index2),]

Note: This removes also the rows, containing the markers. 注意:这还将删除包含标记的行。 If you want to only remove rows between the markes add a +1/-1 at the subsetting step. 如果只想删除标记之间的行,请在子设置步骤中添加+ 1 / -1。

The solution is now based on maRtin's suggestion and seems to work pretty well. 现在,该解决方案基于maRtin的建议,并且效果很好。

I do the following going through all pairs of start and end markers: 我将对所有开始和结束标记进行以下操作:

to_delete <- c()
## Care = 2001/1001
startm1 <- 2001
endm1 <- 1001
index1 <- which((replace_all[,2]  == startm1))
index2 <- which((replace_all[,2]  == endm1))
if(length(index1) !=0){
  for (i in 1:length(index1)){
    if (index2[i]-index1[i]>1){
      to_delete <- c(to_delete, (((index1[i])+1):((index2[i])-1)))
    }
  }
}

... go through all other pairs of start/stop markers and then remove to_delete ...经历所有其他的开始/停止标记对,然后删除to_delete

if (length(to_delete) != 0){
    replace_all <- replace_all[-to_delete,]
  }
    replace_all <- replace_all[,1]
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM