简体   繁体   中英

How do I subset a data frame in R based on the next occurrence?

Allow me to elaborate. Basically, I have a data frame with 4 columns, and one of the columns have NA's in them. When NA's do occur, they always occur in groups. I am looping through this data frame row by row, looking at the column. What I want to do is as soon as I find a NA, I want to subset the data frame from that row to the row with the last occurrence of a NA, before I reach a normal value.

So for example, let's say we look at my data frame df:

  C1 C2 C3 C4 C5 C6
R1 2  1  2  1  0  0
R2 2  2  1  1  0  0
R3 0  0  1  1  2  1
R4 2  2  1  NA 0  0
R5 0  0  1  NA 2  1
R6 0  0  1  NA 2  1
R7 2  2  1  NA 0  0
R8 0  0  1  1  2  1
R9 2  1  2  1  0  0
R10 2  2  1  1  0  0
R11 0  0  1  1  2  1
R12 2  2  1  NA 0  0
R13 0  0  1  NA 2  1
R14 0  0  1  NA 2  1

As I then loop through df row by row, I come accross the first NA in row 4, I then want to subset df from row 4 to row 7, which is where the last NA is in this particular group of NA's.

Subset:

R4 2  2  1  NA 0  0
R5 0  0  1  NA 2  1
R6 0  0  1  NA 2  1
R7 2  2  1  NA 0  0

Notice that I did not subset all of the rows with NA, only the current "group" of NA I was looking at. I did not subset from rows 12-14.

How do I do this?

One way is to store the ids of consecutive NA s in a list and then subset however you want later (using lapply or explicit for-loops )

isna <- is.na(df$C4)
idx <- which(isna)
rr <- rle(isna)
idx <- split(idx, rep(seq(sum(rr$values)), rr$lengths[rr$values]))
# $`1`
# [1] 4 5 6 7

# $`2`
# [1] 12 13 14

They correspond to row numbers... Now, you can subset:

using lapply :

oo <- lapply(idx, function(ix) {
    this_sub <- df[ix, ]
    # do whatever you want
})

using for-loop :

for (i in seq_along(idx)) {
    this_sub <- df[idx[[i]], ]
    # do whatever you want
}

If you want to have a data frame containing all rows that have NA in column 'C4' you do:

df[which(is.na(df$C4)), ] 

where df is your data frame.

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM