[英]How to combine multiple vectors such that elements of each vector are distributed as equally as possible?
Let's say I have two or more vectors with to or more elements (single factor) each, eg假设我有两个或多个向量,每个向量都有 to 或多个元素(单因子),例如
v1 = c("a", "a", "a")
v2 = c("b", "b")
What I want to do is to merge all vectors and distribute the elements for each group as equally as possible.我想要做的是合并所有向量并尽可能平均地分配每个组的元素。
For the simple example above there would be a single solution:对于上面的简单示例,将有一个解决方案:
c("a", "b", "a", "b", "a")
If v1 = c("a", "a", "a", "a")
any of these如果v1 = c("a", "a", "a", "a")
其中任何一个
c("a", "b", "a", "b", "a", "a")
c("a", "b", "a", "a", "b", "a")
c("a", "a", "b", "a", "b", "a")
would be the best solution.将是最好的解决方案。 Is there a built-in function that can do this?有没有内置的 function 可以做到这一点? Any ideas how to implement it?任何想法如何实现它?
This would work for two vectors.这适用于两个向量。
v1 = c("a", "a", "a")
v2 = c("b", "b")
distribute_equally <- function(v1, v2) {
v3 <- c(v1, v2)
tab <- sort(table(v3))
c(rep(names(tab), min(tab)), rep(names(tab)[2], diff(range(tab))))
}
distribute_equally(v1, v2)
#[1] "b" "a" "b" "a" "a"
distribute_equally(c('a', 'a'), c('b', 'b'))
#[1] "a" "b" "a" "b"
Thinking of the problem in terms of experimental design optimization, we can get a general solution using the MaxProQQ
function in the MaxPro
package.从实验设计优化的角度考虑问题,我们可以使用MaxPro
package 中的MaxProQQ
function 得到一个通用的解决方案。
Each position in the merged vector can be thought of as coming from a discrete quantitative factor, and the factors from your v1
, v2
, etc. can be thought of as qualitative factors.合并向量中的每个 position 都可以被认为来自离散的定量因素,而来自v1
、 v2
等的因素可以被认为是定性因素。 Here's some example code ( MaxProQQ
takes integer factors instead of characters, but you can convert it afterward):这是一些示例代码( MaxProQQ
采用 integer 因子而不是字符,但您可以在之后进行转换):
library(MaxPro)
set.seed(1)
v1 <- rep(1, sample.int(10, 1))
v2 <- rep(2, sample.int(10, 1))
v3 <- rep(3, sample.int(10, 1))
v4 <- rep(4, sample.int(10, 1))
vComb <- c(v1, v2, v3, v4)
vMerge1234 <- MaxProQQ(cbind(1:length(vComb), sample(vComb, length(vComb))), p_nom = 1)$Design
vMerge1234 <- vMerge1234[order(vMerge1234[,1]),][,2]
> vMerge1234
[1] 4 3 4 2 4 3 4 1 2 4 3 4 2 4 3 1 4 3 2 4 1 3 4
Generate 100 samples, say, without replacement from c(v1, v2) giving m which is 5x100 with one column per sample.例如,生成 100 个样本,无需从 c(v1, v2) 替换,给出 m,即 5x100,每个样本一列。 Then find the column for which the sum of the variances of the frequencies over each group is minimized.然后找到每组频率的方差之和最小的列。 If there are more than two vectors just concatenate them in the line marked ## and the rest of the code stays the same.如果有两个以上的向量,只需在标有 ## 的行中连接它们,代码的 rest 保持不变。
set.seed(123)
v1 = c("a", "a", "a")
v2 = c("b", "b")
v <- c(v1, v2) ##
m <- replicate(100, sample(v))
varsum <- apply(m, 2, function(x) {
f <- factor(x, levels = unique(v))
sum(tapply(f, v, function(x) var(table(x))))
})
m[, which.min(varsum)]
## [1] "a" "a" "b" "b" "a"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.