在MATLAB中从一个很大的数组中按索引选择n个加权元素

Question

假设我有一个非常大的方阵M（i，j），使得矩阵中的每个元素都代表该元素在加权随机选择中被选中的概率。 我需要对矩阵中的n个元素进行采样（通过（i，j）索引）进行替换。 权重将在主循环的每次迭代中更改。

目前，我正在使用类似以下的内容：

for m = 1:M_size
    xMean(m) = mean(M(:, m));
end

[~, j_list] = histc(rand(n, 1), cumsum([0; xMean'./sum(xMean)']));
for c = 1:n
    [~, i_list(c)] = ...
      histc(rand(1, 1), cumsum([0;, M(:, j_list(c))./sum(M(:, j_list(c)))]));
end

但这似乎是一个笨拙的方法，由于for循环，它也需要很长时间。 有没有更有效的方法？ 也许如果我以某种方式向量化矩阵？

*编辑我应该提到我无权访问统计工具箱

提前谢谢了。

Answer 1

randsample （ docs ）是您的朋友在这里。 我将使用以下方法将其转换为索引，然后再返回到下标：

selected_indexes = randsample(1:numel(M), n, true, M(:));
[sub_i, sub_j] = ind2sub(size(M), selected_indexes);

您可能需要对M进行一些转置以获得适当的尺寸。

Answer 2

% M is ixj
xMean = transpose(mean(M,1));
%xMean is jx1, so i hope n == j
[~, j_list] = histc(rand(n, 1), cumsum([0; xMean./sum(xMean)]));
% j_list is not used? but is j x 1
cumsumvals = cumsum([zeros(1,jj);, M(:,j_list(1:n))./kron(sum(M(:,j_list(1:n))),ones(ii,1))],1),1)
% cumsumvals is i+1 x j, so looks like it should work
% but histc won't work with a matrix valued edge parameter
% you'll need to look into hist3 for that
for c = 1:n
    [~, i_list(c)] = ...
      histc(rand(1, 1), cumsumvals(:,c));
end

因此，它更接近了，但是您需要hist3使其完全矢量化。

Answer 3

我认为我实际上可以通过取消向量化来解决此问题。 也就是说，仅使用预定义的数组和简单的操作，就可以删除所有高级调用和昂贵的操作，并将其简化为必要的内容。

算法的核心是：

确定权重之和
在0和权重之和之间选择n个随机数，对它们进行排序。
手动实现一个cumsum循环。 但是，不是存储所有累积和，而是存储索引，其总和从小于当前随机数跳到大于当前随机数。

在代码中（有点时序），如下所示：

tic
for ixTiming = 1:1000

    M = abs(randn(50));
    M_size = size(M, 2);
    n = 8;
    total = sum(M(:));

    randIndexes = sort(rand(n,1) * total);

    list = zeros(n,1);
    ixM = 1;
    ixNextList = 1;
    curSum = 0;
    while ixNextList<=n  && ixM<numel(M)
        while curSum<randIndexes(ixNextList) && ixM<=numel(M)
            curSum = curSum+M(ixM);
            ixM = ixM + 1;
        end
        list(ixNextList) = ixM;
        ixNextList = ixNextList+1;
    end
    [i_list, j_list] = ind2sub(size(M),list);

end
toc; %0.216 sec. on my computer

将此与原始问题中的代码进行比较：

tic
for ixTiming = 1:1000
    M = abs(randn(50));
    M_size = size(M, 2);
    n = 8;

    for m = 1:M_size
        xMean(m) = mean(M(:, m));
    end

    [~, j_list] = histc(rand(n, 1), cumsum([0; xMean'./sum(xMean)']));
    for c = 1:n
        [~, i_list(c)] = ...
            histc(rand(1, 1), cumsum([0;, M(:, j_list(c))./sum(M(:, j_list(c)))]));
    end
end
toc;  %1.10 sec on my computer

注意事项和优化。

我尚未对此进行广泛测试。 随机数运算很难保证适当的随机行为。 在许多蒙特卡洛集上运行一些测试用例，以确保行为符合预期。 尤其要注意一字不漏的类型错误。
进行分析，然后以任何缓慢的步骤寻找其他改进。 一些可能性。
- 更改M保持total ，因此您无需重新计算。
- 根据0和total检查randIndexes的最低和最高值。 如果randIndexes(1) is larger than total-randIndexes（end） , then increment ixM from numel（M） , then increment to 1 , rather than from 1递增to numel（M）`。

在MATLAB中从一个很大的数组中按索引选择n个加权元素

问题描述

3 个解决方案

解决方案1
1 2012-02-24 15:52:37

解决方案2
0 2012-02-24 16:08:09

解决方案3
0 2012-02-24 18:10:48

在MATLAB中从一个很大的数组中按索引选择n个加权元素

问题描述

3 个解决方案

解决方案1 1 2012-02-24 15:52:37

解决方案2 0 2012-02-24 16:08:09

解决方案3 0 2012-02-24 18:10:48

解决方案1
1 2012-02-24 15:52:37

解决方案2
0 2012-02-24 16:08:09

解决方案3
0 2012-02-24 18:10:48