随机选择直方图数据MATLAB

Question

I have an input 2D histogram that I want to do 2-fold cross-validation with. 我有一个输入2D直方图，我想用2倍交叉验证。 The problem is I don't know how to extract two mutually exclusive random samples of the data from a histogram. 问题是我不知道如何从直方图中提取数据的两个互斥的随机样本。 If it was a couple of lists of the positional information of each data point, that would be easy - shuffle the data in the lists in the same way, and split the lists equally. 如果它是每个数据点的位置信息的几个列表，那将很容易 - 以相同的方式对列表中的数据进行洗牌，并平均分割列表。

So for a list I would do this: 所以对于列表我会这样做：

list1 = [1,2,3,3,5,6,1];
list2 = [1,3,6,6,5,2,1];

idx = randperm(length(list1)); % ie. idx = [4 3 1 5 6 2 7]
shlist1 = list1(idx); % shlist1 = [3,3,1,5,6,2,1]
shlist2 = list2(idx); % shlist2 = [6,6,1,5,2,3,1]

slist1 = shlist1(1:3); % slist1 = [3,3,1]
elist1 = shlist1(4:6); % elist1 = [5,6,2,1]
slist2 = shlist2(1:3); % slist2 = [6,6,1]
elist2 = shlist2(4:6); % elist2 = [5,2,3,1]

But if this same data was presented to me as a histogram 但如果将这些相同的数据作为直方图呈现给我

hist = [2 0 0 0 0 0]
       [0 0 0 0 0 1]
       [0 1 0 0 0 0]
       [0 0 0 0 0 0]
       [0 0 0 0 1 0]
       [0 0 2 0 0 0]

I want the result to be something like this 我希望结果是这样的

hist1 = [0 0 0 0 0 0]
        [0 0 0 0 0 1]
        [0 1 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 1 0 0 0]

hist2 = [2 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 1 0]
        [0 0 1 0 0 0]

so that different halves of the data are randomly, and equally assigned to two new histograms. 这样，数据的不同部分是随机的，并且均等地分配给两个新的直方图。

Would this be equivalent to taking a random integer height of each bin hist(i,j), and adding that to the equivalent bin in hist1(i,j), and the difference to hist2(i,j)? 这相当于取每个bin hist（i，j）的随机整数高度，并将其添加到hist1（i，j）中的等效bin，以及与hist2（i，j）的差异？

% hist as shown above
hist1 = zeros(6);
hist2 = zeros(6);
for i = 1:length(hist(:,1))*length(hist(1,:))
    randNum = rand;
    hist1(i) = round(hist(i)*randNum);
    hist2(i) = hist(i) - hist1(i);
end

And if that is equivalent, is there a better way/built-in way of doing it? 如果这是相同的，有没有更好的方法/内置的方式呢？

My actual histogram is 300x300 bins, and contains about 6,000,000 data points, and it needs to be fast. 我的实际直方图是300x300箱，包含大约6,000,000个数据点，需要快速。

Thanks for any help :) 谢谢你的帮助：）

EDIT: The suggested bit of code I made is not equivalent to taking a random sample of positional points from a list, as it does not maintain the overall probability density function of the data. 编辑：我建议的代码位不等于从列表中随机抽取位置点，因为它不保持数据的整体概率密度函数。 Halving the histograms should be fine for my 6,000,000 points, but I was hoping for a method that would still work for few points. 对于我的6,000,000点，将直方图减半应该没问题，但是我希望有一种方法可以用于几点。

Answer 1

You can use rand or randi to generate two histograms. 您可以使用rand或randi生成两个直方图。 The first method is more efficient however the second is more random. 第一种方法更有效，但第二种方法更随机。

h    = [[2 0 0 0 0 0]
       [0 0 0 0 0 1]
       [0 1 0 0 0 0]
       [0 0 0 0 0 0]
       [0 0 0 0 1 0]
       [0 0 2 0 0 0]];

%using rand    
h1 = round(rand(size(h)).*h);
h2 = h - h1;

%using randi
h1 = zeros(size(h));
for k = 1:numel(h)
    h1(k) = randi([0 h(k)]);
end
h2 = h - h1;

Answer 2

Suppose H is your 2D histogram. 假设H是您的2D直方图。 The following code extracts a single random index with a probability proportional to the count at that index - which I think is what you want. 以下代码提取单个随机索引，其概率与该索引处的计数成比例 - 我认为这是您想要的。

cc = cumsum(H(:));
if cc(1) ~= 0
    cc = [0; cc];
end
m = cc(end);
ix = find(cc > m*rand, 1);

To extract multiple samples, you need to write your own find function (preferably a binary search for efficiency) that extracts some n number of samples in one call. 要提取多个样本，您需要编写自己的查找函数（最好是效率的二进制搜索），在一次调用中提取n个样本。 This will give you a vector of indices (call it ix_vec) chosen with probability proportional to the Histogram count at each index. 这将为您提供一个索引向量（称为ix_vec），其选择概率与每个索引处的直方图计数成比例。

Then if we denote by X the numerical values corresponding to each location in the Histogram, your random sample is: 然后，如果我们用X表示直方图中每个位置对应的数值，那么随机样本是：

R1 = X(ix_vec);

Repeat for the second random sample set. 重复第二个随机样本集。

随机选择直方图数据MATLAB

问题描述

2 个解决方案

解决方案1
0 2017-03-22 05:40:59

解决方案2
0 2017-03-22 19:59:49

随机选择直方图数据MATLAB

问题描述

2 个解决方案

解决方案1 0 2017-03-22 05:40:59

解决方案2 0 2017-03-22 19:59:49

解决方案1
0 2017-03-22 05:40:59

解决方案2
0 2017-03-22 19:59:49