I have the following vector called input
:
input <- c(1,2,1,NA,3,2,NA,1,5,6,NA,2,2)
[1] 1 2 1 NA 3 2 NA 1 5 6 NA 2 2
I would like to split this vector into multiple vectors by each NA. So the desired output should look like this:
> output
[[1]]
[1] 1 2 1
[[2]]
[1] 3 2
[[3]]
[1] 1 5 6
[[4]]
[1] 2 2
As you can see every time a NA
appears, it splits into a new vector. So I was wondering if anyone knows how to split a vector by each NA
into multiple vectors?
Using a similar logic to @tpetzoldt, but removing the NAs before the split:
split(na.omit(input), cumsum(is.na(input))[!is.na(input)])
$`0`
[1] 1 2 1
$`1`
[1] 3 2
$`2`
[1] 1 5 6
$`3`
[1] 2 2
One way could go like follows:
NA
scumsum
NA
sinput <- c(1,2,1,NA,3,2,NA,1,5,6,NA,2,2)
tmp <- cumsum(is.na(input))
lapply(split(input, tmp), na.omit)
This one is too verbose and overcomplicated, but for me it is easier to think of such problems, just wanted to share:
library(tidyverse)
tibble(input) %>%
group_by(id = cumsum(is.na(input))) %>%
na.omit %>%
group_split() %>%
map(.,~(.x %>%select(-id))) %>%
map(.,~(.x %>%pull))
[[1]]
[1] 1 2 1
[[2]]
[1] 3 2
[[3]]
[1] 1 5 6
[[4]]
[1] 2 2
Here's a solution that is not verbose:
strsplit(paste(input, collapse = " "), " NA ")
[[1]]
[1] "1 2 1" "3 2" "1 5 6" "2 2"
Another, quite similar way like @tpetzoldt and @tmfmnk, also removing the NA
.
. <- is.na(input)
split(input[!.], cumsum(.)[!.])
#$`0`
#[1] 1 2 1
#
#$`1`
#[1] 3 2
#
#$`2`
#[1] 1 5 6
#
#$`3`
#[1] 2 2
Or the other way round
i <- !is.na(input)
split(input[i], cumsum(!i)[i])
or even
i <- is.na(input)
j <- which(!i)
split(input[j], cumsum(.)[j])
Benchmark
set.seed(42)
n <- 1e5
input <- sample(c(1:9, NA), n, TRUE)
library(tidyverse) #for TarJae
bench::mark(check = FALSE,
tmfmnk = split(na.omit(input), cumsum(is.na(input))[!is.na(input)]),
tpetzoldt = {tmp <- cumsum(is.na(input))
lapply(split(input, tmp), na.omit)},
TarJae = {tibble(input) %>%
group_by(id = cumsum(is.na(input))) %>%
na.omit %>%
group_split() %>%
map(.,~(.x %>%select(-id))) %>%
map(.,~(.x %>%pull))},
ChrisR = strsplit(paste(input, collapse = " "), " NA "), #Returns String
Thomas = split(na.omit(input), findInterval(seq_along(input)[!is.na(input)], which(is.na(input)))),
GKi1 = {. <- is.na(input); split(input[!.], cumsum(.)[!.])},
GKi2 = {i <- !is.na(input); split(input[i], cumsum(!i)[i])},
GKi3 = {i <- is.na(input); j <- which(!i); split(input[j], cumsum(.)[j])}
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
# <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#1 tmfmnk 7.28ms 8.25ms 45.7 7.93MB 5.96 23 3 503.5ms
#2 tpetzoldt 46.65ms 49.07ms 19.8 4.4MB 5.95 10 3 504.4ms
#3 TarJae 14.17s 14.17s 0.0706 98.25MB 3.74 1 53 14.2s
#4 ChrisR 17.92ms 18.47ms 54.2 1.8MB 0 28 0 516.6ms
#5 Thomas 7.78ms 7.92ms 113. 8.71MB 25.8 57 13 503.7ms
#6 GKi1 6.71ms 6.84ms 81.6 6.63MB 7.96 41 4 502.3ms
#7 GKi2 6.71ms 6.81ms 136. 6.63MB 11.9 69 6 506ms
#8 GKi3 6.6ms 6.71ms 143. 5.52MB 11.9 72 6 502.8ms
GKi3 is in this case about 1.2 times faster than Tomas, 2.5 times than ChrisR, 3 times than tmfmnk, 7 times than tpetzoldt and 2000 than TarJae.
We can use split
+ findIntervals
as well
> split(na.omit(input), findInterval(seq_along(input)[!is.na(input)], which(is.na(input))))
$`0`
[1] 1 2 1
$`1`
[1] 3 2
$`2`
[1] 1 5 6
$`3`
[1] 2 2
One way to split a vector by each NA value into multiple vectors is to use the split function in R.
Here is an example of how you could do this:
na_indices <- which(is.na(input))
output <- split(input, cumsum(c(1, diff(na_indices) > 1)))
This will create a list called output that contains multiple vectors, with each vector representing a group of consecutive values in the input vector that are separated by one or more NA values.
You can then access each vector in the list using indexing, for example:
output[[1]] # access the first vector in the list output[[2]] # access the second vector in the list
I hope this helps. Let me know if you have any questions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.