简体   繁体   中英

How to select increasing values from two vectors - 'weaving two vectors'

I have two vectors of different length. Both vectors have values in order:

v1 <- c(1:5, 11:18)
v2 <- c(2, 7, 8, 14)
v1         
# [1]  1  2  3  4  5 11 12 13 14 15 16 17 18
v2
# [1]  2  7  8 14

Starting with the first element of v1, I want to alternate between two vectors and select one element from each of the vectors at a time. Each subsequent value to be selected should be larger than the preceeding value.

The desired sequence after 'weaving' the two vectors:

c(1, 2, 3, 7, 11, 14, 15)

Thus, we start with the first element in v1 ( 1 ). Then, the next element should be selected from v2 and be larger than the preceeding selected value; we pick 2 from v2 ( 2 > 1 ). Next value should come from v1 and be larger than 2 : we pick 3 from v1. Then 7 from v2 ( 7 > 3 ), 11 from v1 ( 11 > 7 ), and so on, alternating between the vectors, picking increasing values.

When there are no more elements in v2 that is greater than the preceeding value in v1, we terminate the selection of values. Thus, in this case, 15 is the last value we pick from v1 (16, 17, 18 are discarded):

在此处输入图片说明


I would prefer vectorized operations instead of unnecessary loops.

Extra: my dummy code is in int. But my actual data is well structured time values that I can directly use as arguments for functions from the 'lubridate' package. Is there any function that can do the job?

Q1) Is there an existing function that does this? Q2) Is there a way to do this in a vectorized approach instead of looping and trimming the input vector after each loop?

See if this is general enough:

# extend the shorter of the two vectors, bind them to a matrix, 
m <- cbind(v1, v2[1:length(v1)])

# 'weave' the two vectors and bind with a vector index
m2 <- cbind(c(t(m)), 1:2)

# remove NA and duplicates
m3 <- m2[!is.na(m2[ , 1]) & !duplicated(m2[ , 1]), ]

# order 
m3 <- m3[order(m3[ , 1]), ]

# to pick values from every other vector,
# create a run-length id based on the vector index,
# remove duplicates of it, and use as index 
m3[!duplicated(cumsum(c(1L, m3[ , 2][-nrow(m3)] != m3[ , 2][-1]))), 1]
# [1]  1  2  3  7 11 14 15

Same idea, but slightly more compact with data.table :

library(data.table)
m <- cbind(v1, v2[1:length(v1)])
d <- data.table(v = c(t(m)), g = 1:2)
d2 <- d[!is.na(v) & !duplicated(v), ]
setorder(d2, v)
d2[ , .SD[1], by = rleid(g)]$v
# [1]  1  2  3  7 11 14 15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM