简体   繁体   中英

How to combine multiple vectors such that elements of each vector are distributed as equally as possible?

Let's say I have two or more vectors with to or more elements (single factor) each, eg

v1 = c("a", "a", "a")
v2 = c("b", "b")

What I want to do is to merge all vectors and distribute the elements for each group as equally as possible.

For the simple example above there would be a single solution:

c("a", "b", "a", "b", "a")

If v1 = c("a", "a", "a", "a") any of these

c("a", "b", "a", "b", "a", "a")
c("a", "b", "a", "a", "b", "a")
c("a", "a", "b", "a", "b", "a")

would be the best solution. Is there a built-in function that can do this? Any ideas how to implement it?

This would work for two vectors.

v1 = c("a", "a", "a")
v2 = c("b", "b")

distribute_equally <- function(v1, v2) {
  v3 <- c(v1, v2)
  tab <- sort(table(v3))
  c(rep(names(tab), min(tab)), rep(names(tab)[2], diff(range(tab))))
}

distribute_equally(v1, v2)
#[1] "b" "a" "b" "a" "a"

distribute_equally(c('a', 'a'), c('b', 'b'))
#[1] "a" "b" "a" "b"

Thinking of the problem in terms of experimental design optimization, we can get a general solution using the MaxProQQ function in the MaxPro package.

Each position in the merged vector can be thought of as coming from a discrete quantitative factor, and the factors from your v1 , v2 , etc. can be thought of as qualitative factors. Here's some example code ( MaxProQQ takes integer factors instead of characters, but you can convert it afterward):

library(MaxPro)

set.seed(1)

v1 <- rep(1, sample.int(10, 1))
v2 <- rep(2, sample.int(10, 1))
v3 <- rep(3, sample.int(10, 1))
v4 <- rep(4, sample.int(10, 1))

vComb <- c(v1, v2, v3, v4)
vMerge1234 <- MaxProQQ(cbind(1:length(vComb), sample(vComb, length(vComb))), p_nom = 1)$Design
vMerge1234 <- vMerge1234[order(vMerge1234[,1]),][,2]

> vMerge1234
 [1] 4 3 4 2 4 3 4 1 2 4 3 4 2 4 3 1 4 3 2 4 1 3 4

Generate 100 samples, say, without replacement from c(v1, v2) giving m which is 5x100 with one column per sample. Then find the column for which the sum of the variances of the frequencies over each group is minimized. If there are more than two vectors just concatenate them in the line marked ## and the rest of the code stays the same.

set.seed(123)
v1 = c("a", "a", "a")
v2 = c("b", "b")

v <- c(v1, v2) ##
m <- replicate(100, sample(v))
varsum <- apply(m, 2, function(x) {
  f <- factor(x, levels = unique(v))
  sum(tapply(f, v, function(x) var(table(x))))
})
m[, which.min(varsum)]
## [1] "a" "a" "b" "b" "a"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM