简体   繁体   English

如何在R的数据框中的列中查找和删除一定数量的具有相同连续值的行?

[英]How to find and delete a certain number of rows with the same consecutive value in a column in a dataframe in R?

In my dataframe there is a column with "Sound" and "Response" as values. 在我的数据框中,有一列以“声音”和“响应”为值。 Ideally, the pattern is two Sounds followed by one Response. 理想情况下,模式是两个声音后跟一个响应。 But, it can happen that there are three Sounds followed by a Response. 但是,可能会发生三个声音后跟一个响应的情况。

How can I tell R to raise a flag whenever it finds this pattern in my data? 每当R在我的数据中发现此模式时,如何告诉R升旗? I need to look at each case individually before I can delete the third Sound-row. 在删除第三个声行之前,我需要分别查看每种情况。

>df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE) 

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN    Sound ZYXc01i35
SN Response       100     

So, after finding three consecutive Sounds and deleting the last one of them (ie the one just before the folowing Response), I should have the desired pattern like this: 因此,找到三个连续的声音并删除它们中的最后一个(即紧随以下响应之前的声音)后,我应该具有所需的模式,如下所示:

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN Response       100  

I'm sorry that I keep posting these basic questions. 抱歉,我一直在发布这些基本问题。 Any ideas are, as always, greatly appreciated! 与往常一样,任何想法都将不胜感激!

cumsum(rle(df$V3)$lengths)[rle(df$V3)$lengths == 3]
[1] 6

this returns the vector of positions where "Sound" is third in a row. 这将返回“声音”连续排在第三位的位置矢量。 Now you can easily delete them or make some column to mark these positions. 现在,您可以轻松地删除它们或在某些列中标记这些位置。

I think this will work, although there are probably much simpler solutions: 我认为这可行,尽管可能有更简单的解决方案:

df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE)

df

my.run <- rep(0,dim(df)[1])

if(df$V3[1]=='Sound') (my.run[1] = 1) else my.run[1] = 0

for (i in 2:dim(df)[1]) {

     if(df$V3[i]=='Sound') (my.run[i] = my.run[i-1] + 1) else my.run[i] = 0

}

df2 <- df[my.run < 3,]
df2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM