Allow me to elaborate. Basically, I have a data frame with 4 columns, and one of the columns have NA's in them. When NA's do occur, they always occur in groups. I am looping through this data frame row by row, looking at the column. What I want to do is as soon as I find a NA, I want to subset the data frame from that row to the row with the last occurrence of a NA, before I reach a normal value.
So for example, let's say we look at my data frame df:
C1 C2 C3 C4 C5 C6
R1 2 1 2 1 0 0
R2 2 2 1 1 0 0
R3 0 0 1 1 2 1
R4 2 2 1 NA 0 0
R5 0 0 1 NA 2 1
R6 0 0 1 NA 2 1
R7 2 2 1 NA 0 0
R8 0 0 1 1 2 1
R9 2 1 2 1 0 0
R10 2 2 1 1 0 0
R11 0 0 1 1 2 1
R12 2 2 1 NA 0 0
R13 0 0 1 NA 2 1
R14 0 0 1 NA 2 1
As I then loop through df row by row, I come accross the first NA in row 4, I then want to subset df from row 4 to row 7, which is where the last NA is in this particular group of NA's.
Subset:
R4 2 2 1 NA 0 0
R5 0 0 1 NA 2 1
R6 0 0 1 NA 2 1
R7 2 2 1 NA 0 0
Notice that I did not subset all of the rows with NA, only the current "group" of NA I was looking at. I did not subset from rows 12-14.
How do I do this?
One way is to store the ids of consecutive NA
s in a list and then subset however you want later (using lapply
or explicit for-loops
)
isna <- is.na(df$C4)
idx <- which(isna)
rr <- rle(isna)
idx <- split(idx, rep(seq(sum(rr$values)), rr$lengths[rr$values]))
# $`1`
# [1] 4 5 6 7
# $`2`
# [1] 12 13 14
They correspond to row numbers... Now, you can subset:
using lapply
:
oo <- lapply(idx, function(ix) {
this_sub <- df[ix, ]
# do whatever you want
})
using for-loop
:
for (i in seq_along(idx)) {
this_sub <- df[idx[[i]], ]
# do whatever you want
}
If you want to have a data frame containing all rows that have NA in column 'C4' you do:
df[which(is.na(df$C4)), ]
where df is your data frame.
Hope it helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.