简体   繁体   中英

Delete rows in dataframe based on values in multiple previous rows/columns

I have the following dataframe:

   x  y  z
1  a  c  0
2  a  c  0
3  a  c  1
4  a  c  0
5  a  c  0
6  b  c  0
7  b  c  0
8  b  c  0
9  b  c  1
10 b  c  0
11 b  c  0
12 b  c  0
13 a  d  0
14 a  d  0
15 a  d  0

I would like to delete rows for which there is a 1 in a previous row of column z with the same values in columns x and y. For example, for Row 10, I want to search Rows 1:9 for a row in which x = "b", y = "c", and z equals 1. If such a row exists in Rows 1:9, I want to delete Row 10.

Therefore, the resulting dataframe would remove rows 4, 5, 10, 11, and 12:

   x  y  z
1  a  c  0
2  a  c  0
3  a  c  1
4  b  c  0
5  b  c  0
6  b  c  0
7  b  c  1
8  a  d  0
9  a  d  0
10 a  d  0

We can do this with data.table

library(data.table)
setDT(df1)[-df1[, .I[cummin(c(0, diff(z==1)))<0], .(x, y)]$V1]
#    x y z
# 1: a c 0
# 2: a c 0
# 3: a c 1
# 4: b c 0
# 5: b c 0
# 6: b c 0
# 7: b c 1
# 8: a d 0
# 9: a d 0
#10: a d 0

Here is a base R method with ave for grouping, interaction to construct the groups, and a bit of logical manipulation with an anonymous function. as.logical converts the output of ave , which is 1s and 0s into a logical vector which is used for substituting.

The anonymous function c(1,head(cummin(i != 1), -1)) returns a 1 for the first element of each group, as it will always be kept. For the remainder, we check if the previous value is not 1 and return the cumulative minimum, thus any instance of 1 will return 0 for the remaining elements. head is used to drop the final element as it is not part of the consideration.

df[as.logical(ave(df$z, interaction(df$x, df$y),
                  FUN=function(i) c(1,head(cummin(i != 1), -1)))), ]
   x y z
1  a c 0
2  a c 0
3  a c 1
6  b c 0
7  b c 0
8  b c 0
9  b c 1
13 a d 0
14 a d 0
15 a d 0

I am not sure I get your question, but if you want to delete all row where z = 1 you can use

which(nameofdataframe$z != 1)

If you want more arguments you can use & like this:

which(nameofdataframe$z != 1 & nameofdataframe$x == "b")

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM