简体   繁体   中英

How to find and delete a certain number of rows with the same consecutive value in a column in a dataframe in R?

In my dataframe there is a column with "Sound" and "Response" as values. Ideally, the pattern is two Sounds followed by one Response. But, it can happen that there are three Sounds followed by a Response.

How can I tell R to raise a flag whenever it finds this pattern in my data? I need to look at each case individually before I can delete the third Sound-row.

>df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE) 

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN    Sound ZYXc01i35
SN Response       100     

So, after finding three consecutive Sounds and deleting the last one of them (ie the one just before the folowing Response), I should have the desired pattern like this:

V1       V3        V4
SN    Sound XYZc02i03
SN    Sound XYZq02i03
SN Response       200
SN    Sound ZYXc01i30
SN    Sound ZYXq01i30
SN Response       100  

I'm sorry that I keep posting these basic questions. Any ideas are, as always, greatly appreciated!

cumsum(rle(df$V3)$lengths)[rle(df$V3)$lengths == 3]
[1] 6

this returns the vector of positions where "Sound" is third in a row. Now you can easily delete them or make some column to mark these positions.

I think this will work, although there are probably much simpler solutions:

df <- data.frame(V1=rep("SN", 7),  
             V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Sound", "Response"), 
             V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", "ZYXc01i35", 100), 
             stringsAsFactors=FALSE)

df

my.run <- rep(0,dim(df)[1])

if(df$V3[1]=='Sound') (my.run[1] = 1) else my.run[1] = 0

for (i in 2:dim(df)[1]) {

     if(df$V3[i]=='Sound') (my.run[i] = my.run[i-1] + 1) else my.run[i] = 0

}

df2 <- df[my.run < 3,]
df2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM