My data looks like this:
data <- data.frame(
value = runif(10)
id = c("junk","start","1","2","end","morejunk","junk","start","4","end")
)
I want to use filter()
to extract everything from the id
"start"
until the id
"end"
. The problem is the number of observations between the starting row and ending row vary, so I can't filter every x rows. Is there a way to use filter()
in a way that I could specify from = "start"
until = "end"
?
You can first identify where "start" and "end" are. Then using those pairwise of indices to index the data.frame. This assumes that there is a corresponding pair of start and end each time.
set.seed(0L)
data <- data.frame(
value = runif(10),
id = c("junk","start","1","2","end","morejunk","junk","start","4","end")
)
idx <- which(data$id %in% c("start", "end"))
lapply(split(idx, ceiling(seq_along(idx)/2)), function(x) data[x[1]:x[2],])
You can
which
to identify row indices with "start"
and "end"
, :
in parallel via Map
, unlist
to simplify the list to a vector, and slice
which leaves
library(dplyr)
set.seed(47)
data <- data.frame(
value = runif(10),
id = c("junk","start","1","2","end","morejunk","junk","start","4","end")
)
data %>% slice(unlist(Map(`:`,
which(.$id == 'start') + 1,
which(.$id == 'end') - 1)))
#> # A tibble: 3 × 2
#> value id
#> <dbl> <fctr>
#> 1 0.7615020 1
#> 2 0.8224916 2
#> 3 0.5433097 4
or in base,
data[unlist(Map(`:`,
which(data$id == 'start') + 1,
which(data$id == 'end') - 1)), ]
#> value id
#> 3 0.7615020 1
#> 4 0.8224916 2
#> 9 0.5433097 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.