简体   繁体   English

从稀疏矩阵中绘制随机非零元素

[英]Drawing a random non-zero element from a sparse matrix

I have a sparse logical matrix, which is quite large. 我有一个稀疏的逻辑矩阵,它很大。 I would like to draw random non-zero elements from it without storing all of its non-zero elements in a separate vector (eg. by using find command). 我想从中绘制随机的非零元素,而不将其所有非零元素存储在单独的向量中(例如,使用find命令)。 Is there an easy way to do this? 是否有捷径可寻?

Currently I am implementing rejection sampling, which is drawing a random element and checking whether that is non-zero or not. 目前,我正在实施拒绝采样,它绘制了一个随机元素并检查它是否为非零。 But it is not efficient when the ratio of non-zero elements is small. 但是,当非零元素的比例较小时,效率不高。

A sparse logical matrix is not a very practical representation of your data if you want to pick random locations. 如果要选择随机位置,则稀疏逻辑矩阵并不是数据的实用表示。 Rejection sampling and find are the only two ways that make sense to me. 拒绝采样和find是唯一对我有意义的两种方法。 Here's how you can do them efficiently (assuming you want to get 4 random locations): 您可以通过以下方式有效地进行操作(假设您想获得4个随机位置):

%# using find
idx = find(S);
%# draw 4 without replacement
fourRandomIdx = idx(randperm(length(idx),4));
%# draw 4 with replacement
fourRandomIdx = idx(randi(1,length(idx),4));
%# get row, column values
[row,col] = ind2sub(size(S),fourRandomIdx);



%# using rejection sampling
density = nnz(S)/prod(size(S));
%# estimate how many samples you need to get at least 4 hits
%# and multiply by 2 (or 3)
n = ceil( 1 / (1-(1-density)^4) ) * 2;
%# random indices w/ replacement
randIdx = randi(1,n,prod(size(S)));
%# identify the first four non-zero elements
[row,col] = find(S(randIdx),4,'first');

An nxm matrix with nnz non-zero elements requires nnz + n + 1 integers to store the locations of its non-zero entries. 具有nnz个非零元素的nxm矩阵需要nnz + n + 1个整数来存储其非零条目的位置。 For a logical matrix there is no need to store the value of the non-zero entries: these are all true. 对于逻辑矩阵,无需存储非零条目的值:这些都是正确的。 Correspondingly, you would do best to convert your logical sparse matrix into a list of the linear indices of its non-zero entries, together with n and m, which requires only nnz + 2 integers of storage. 相应地,您最好将逻辑稀疏矩阵与n和m一起转换为其非零条目的线性索引的列表,而n和m仅需要nnz + 2个整数存储。 From these (and ind2sub) you can readily reconstruct the subscripts corresponding to any non-zero entry that you choose randomly using randi over the range 1..nnz 从这些(和ind2sub)中,您可以轻松地重建与您使用randi在1..nnz范围内随机选择的任何非零条目相对应的下标。

find is the standard interface to get the non-zero elements in a sparse matrix. find是获取稀疏矩阵中非零元素的标准接口。 Have a look here http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040 在这里看看http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040

[i,j,s] = find(S)

find returns the row indices of nonzero values in vector i, the column indices in vector j, and the nonzero values themselves in the vector s. find返回向量i中非零值的行索引,向量j中的列索引以及向量s中的非零值本身。

No need to get s. 无需获取。 Just pick a random index in i,j. 只需在i,j中选择一个随机索引。

By representing the entries in a 3 column format, aka a coordinate list (i, j, value), you can simply select the items from the list. 通过以3列格式(也称为坐标列表(i,j,值))表示条目,您可以简单地从列表中选择项目。 To get this, you can either use your original method for creating the sparse matrix (ie the precursor to sparse() ), or use the find command, a la [i,j,s] = find(S); 为此,您可以使用原始方法创建稀疏矩阵(即sparse()的前身),也可以使用find命令la [i,j,s] = find(S);

If you don't need the entries, and it seems you don't, you can just extract i and j . 如果您不需要这些条目,并且似乎不需要,则可以提取ij

If, for some reason, your matrix is massive and your RAM limitations are severe, you can simply divide the matrix into regions, and let the probability of selecting a given sub-matrix be proportional to the number of non-zero elements (using nnz ) in that sub-matrix. 如果由于某种原因您的矩阵很大且RAM限制很严格,则可以简单地将矩阵划分为多个区域,并使选择给定子矩阵的概率与非零元素的数量成比例(使用nnz )在该子矩阵中。 You could go so far as to divide the matrix into individual columns, and the rest of the calculation is trivial. 您甚至可以将矩阵划分为单独的列,其余的计算很简单。 NB: by applying sum to the matrix, you can get the per-column counts (assuming your entries are just 1s). 注意:通过将sum应用于矩阵,您可以获取每个列的计数(假设您的条目仅为1s)。

In this way, you need not even bother with rejection sampling (which seems pointless to me in this case, since Matlab knows where all of the non-zero entries are). 这样,您甚至不必费心拒绝采样(在这种情况下,这对我来说似乎毫无意义,因为Matlab知道所有非零条目的位置)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM