简体   繁体   English

当向量很大时,如何获取R中向量的所有可能分区的列表?

[英]How can I get a list of all possible partition of a vector in R when the vector is large?

I am trying to use R to find all possible ways to partition the vector x of length n into at most m partitions . 我正在尝试使用R寻找所有可能的方法来将长度为n的向量x划分为最多m分区。 I know how to do then when n is small: 我知道当n小时该怎么做:

library(partitions)
x <- c(10, 20, 30, 40)
n <- length(x)
m <- 3

# In how many ways can we partition n objects into at most m patitions
parts <- restrictedparts(n, m)
sets <- setparts(parts)

In this example the value of sets is: 在此示例中, sets值为:

[1,] 1 1 1 1 2 1 1 1 1 1 1 2 2 2
[2,] 1 1 1 2 1 2 1 2 2 1 2 1 1 3
[3,] 1 2 1 1 1 2 2 1 3 2 1 3 1 1
[4,] 1 1 2 1 1 1 2 2 1 3 3 1 3 1

Each columns of sets tells me, for each unique arrangement, into which partition each item in x should be allocated. sets每一列告诉我,对于每种唯一的布置, x每个项目应分配到哪个分区中。

The problem occurs when n is large: n大时,会发生此问题:

n <- 15
m <- 4
parts <- restrictedparts(n, m)
# This expression will max out your CPU usage and eventually run out of memory.
sets <- setparts(parts)

How can I do this operation without running out of memory? 如何在不耗尽内存的情况下执行此操作? I doubt there's a fast way to do it, so can I suspect I'll have to do it in batches and write to the disk. 我怀疑是否有一种快速的方法来执行此操作,因此我怀疑我必须分批执行并写入磁盘。

If like me you are not a superstar in combinatorics but you trust that partitions has it right, then at least you can make use of the package's code to compute the final number of partitions. 如果像我一样,您不是组合技术领域的超级巨星,但是您相信partitions是正确的,那么至少您可以利用程序包的代码来计算分区的最终数量。 Here I hacked the setparts function so, instead of the partitions themselves, it returns the number of partitions: 在这里,我修改了setparts函数,因此,它返回的不是分区本身,而是分区的数量:

num.partitions <- function (x) {
    if (length(x) == 1) {
        if (x < 1) {
            stop("if single value, x must be >= 1")
        }
        else if (x == 1) {
            out <- 1
        }
        else return(Recall(parts(x)))
    }
    if (is.matrix(x)) {
        out <- sum(apply(x, 2, num.partitions))
    }
    else {
        x   <- sort(x[x > 0], decreasing = TRUE)
        out <- factorial(sum(x))/(prod(c(factorial(x), 
                                         factorial(table(x)))))
    }
    return(out)
}

Let's check that the function is returning the correct number of partitions: 让我们检查一下函数是否返回了正确数量的分区:

num.partitions(restrictedparts(4, 3))
# [1] 14
ncol(setparts(restrictedparts(4, 3)))
# [1] 14

num.partitions(restrictedparts(8, 4))
# [1] 2795
ncol(setparts(restrictedparts(8, 4)))
# [1] 2795

Now let's look at your large case: 现在,让我们看一下您的大案例:

num.partitions(restrictedparts(15, 4))
# [1] 44747435

That is indeed quite a lot of partitions... Regardless of how well or not setparts is written, the output cannot fit in a single array: 确实有很多分区...不管setparts编写得如何好,输出都不能放在单个数组中:

sets <- matrix(1, 15, 44747435)
# Error in matrix(1, 15, 44747435) : 
#  cannot allocate vector of length 671211525

So yes, you'd have to write your own algorithm and store to a list of matrices, or if it is too much for your memory, write to a file if that's really what you want to do. 因此,是的,您必须编写自己的算法并将其存储到矩阵列表中,或者如果确实占用了太多内存,则如果确实要执行此操作,则写入文件。 Otherwise, given the rather large number of permutations and what it is you want to do with them, go back to the drawing board... 否则,鉴于大量的排列以及您要对它们进行的处理,请返回到绘图板...

If you want to calculate them in batches, it appears this may be possible for at least some of the columns. 如果您要分批计算它们,则看来对于至少某些列而言,这可能是可能的。 I was not able to complete a calculation of several of the individual columns in restrictedparts(15,4) on a machine like yours. 我无法在像您这样的机器上完成对restrictedparts(15,4)中几个单独列的计算。 Up to column 40 I could get success in batches of 5-10 columns at a time but above that there were several single columns that did report a number of columns before throwing a malloc error. 在第40列之前,我可以一次成功获得5-10列的批处理,但除此之外,有几个单列在引发malloc错误之前确实报告了许多列。 So you may simply need a bigger machine. 因此,您可能只需要一台更大的机器。 On my Mac that has 32 GB constructing the 53rd column consumed half of the memory. 在具有32 GB的Mac上,构造第53列消耗了一半的内存。 The estimates for number of columns on the big machine agreed with the report on the 4GB machine: 大型计算机上的列数估计与4GB计算机上的报告一致:

> ncol( setparts( restrictedparts(15,4)[,53]))
[1] 6306300
R(317,0xa077a720) malloc: *** mmap(size=378380288) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

(I offer no opinion on whether this is a sensible project. ) (对于这个项目是否明智,我没有任何意见。)

As I could not install the partitions package (missing libraries), I came up with this : 由于无法安装分区软件包(缺少库),因此我想到了:

 ## Recursive function to get all partitions of a vector 
 ## Returns a list of logical vectors
 parts <- function(x) { 
   if (length(x) == 1) return(list(FALSE, TRUE))
   do.call(c, lapply(parts(x[-1]), function(y) list(c(FALSE, y), c(TRUE, y))))
 }

This function takes a vector and returns a list of logical vectors of the same size. 该函数获取一个向量,并返回相同大小的逻辑向量列表。 The number of vectors in the list is the number of possible partitions, (2^n). 列表中向量的数量是可能的分区数量(2 ^ n)。 It can't handle huge n, but on my pc it runs n=19 in less than a second. 它不能处理巨大的n,但是在我的电脑上,它在不到一秒钟的时间内运行n = 19。

If you want only the non-empty partitions, and no duplicates, use : 如果只需要非空分区,而不要重复,请使用:

 partitions <- parts(x)
 partitions <- partitions[1:(length(partitions)/2)][-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM