简体   繁体   English

随机选择直方图数据MATLAB

[英]randomly select histogram data MATLAB

I have an input 2D histogram that I want to do 2-fold cross-validation with. 我有一个输入2D直方图,我想用2倍交叉验证。 The problem is I don't know how to extract two mutually exclusive random samples of the data from a histogram. 问题是我不知道如何从直方图中提取数据的两个互斥的随机样本。 If it was a couple of lists of the positional information of each data point, that would be easy - shuffle the data in the lists in the same way, and split the lists equally. 如果它是每个数据点的位置信息的几个列表,那将很容易 - 以相同的方式对列表中的数据进行洗牌,并平均分割列表。

So for a list I would do this: 所以对于列表我会这样做:

list1 = [1,2,3,3,5,6,1];
list2 = [1,3,6,6,5,2,1];

idx = randperm(length(list1)); % ie. idx = [4 3 1 5 6 2 7]
shlist1 = list1(idx); % shlist1 = [3,3,1,5,6,2,1]
shlist2 = list2(idx); % shlist2 = [6,6,1,5,2,3,1]

slist1 = shlist1(1:3); % slist1 = [3,3,1]
elist1 = shlist1(4:6); % elist1 = [5,6,2,1]
slist2 = shlist2(1:3); % slist2 = [6,6,1]
elist2 = shlist2(4:6); % elist2 = [5,2,3,1]

But if this same data was presented to me as a histogram 但如果将这些相同的数据作为直方图呈现给我

hist = [2 0 0 0 0 0]
       [0 0 0 0 0 1]
       [0 1 0 0 0 0]
       [0 0 0 0 0 0]
       [0 0 0 0 1 0]
       [0 0 2 0 0 0]

I want the result to be something like this 我希望结果是这样的

hist1 = [0 0 0 0 0 0]
        [0 0 0 0 0 1]
        [0 1 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 1 0 0 0]

hist2 = [2 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 0 0]
        [0 0 0 0 1 0]
        [0 0 1 0 0 0]

so that different halves of the data are randomly, and equally assigned to two new histograms. 这样,数据的不同部分是随机的,并且均等地分配给两个新的直方图。

Would this be equivalent to taking a random integer height of each bin hist(i,j), and adding that to the equivalent bin in hist1(i,j), and the difference to hist2(i,j)? 这相当于取每个bin hist(i,j)的随机整数高度,并将其添加到hist1(i,j)中的等效bin,以及与hist2(i,j)的差异?

% hist as shown above
hist1 = zeros(6);
hist2 = zeros(6);
for i = 1:length(hist(:,1))*length(hist(1,:))
    randNum = rand;
    hist1(i) = round(hist(i)*randNum);
    hist2(i) = hist(i) - hist1(i);
end

And if that is equivalent, is there a better way/built-in way of doing it? 如果这是相同的,有没有更好的方法/内置的方式呢?

My actual histogram is 300x300 bins, and contains about 6,000,000 data points, and it needs to be fast. 我的实际直方图是300x300箱,包含大约6,000,000个数据点,需要快速。

Thanks for any help :) 谢谢你的帮助 :)

EDIT: The suggested bit of code I made is not equivalent to taking a random sample of positional points from a list, as it does not maintain the overall probability density function of the data. 编辑:我建议的代码位不等于从列表中随机抽取位置点,因为它不保持数据的整体概率密度函数。 Halving the histograms should be fine for my 6,000,000 points, but I was hoping for a method that would still work for few points. 对于我的6,000,000点,将直方图减半应该没问题,但是我希望有一种方法可以用于几点。

You can use rand or randi to generate two histograms. 您可以使用randrandi生成两个直方图。 The first method is more efficient however the second is more random. 第一种方法更有效,但第二种方法更随机。

h    = [[2 0 0 0 0 0]
       [0 0 0 0 0 1]
       [0 1 0 0 0 0]
       [0 0 0 0 0 0]
       [0 0 0 0 1 0]
       [0 0 2 0 0 0]];

%using rand    
h1 = round(rand(size(h)).*h);
h2 = h - h1;

%using randi
h1 = zeros(size(h));
for k = 1:numel(h)
    h1(k) = randi([0 h(k)]);
end
h2 = h - h1;

Suppose H is your 2D histogram. 假设H是您的2D直方图。 The following code extracts a single random index with a probability proportional to the count at that index - which I think is what you want. 以下代码提取单个随机索引,其概率与该索引处的计数成比例 - 我认为这是您想要的。

cc = cumsum(H(:));
if cc(1) ~= 0
    cc = [0; cc];
end
m = cc(end);
ix = find(cc > m*rand, 1);

To extract multiple samples, you need to write your own find function (preferably a binary search for efficiency) that extracts some n number of samples in one call. 要提取多个样本,您需要编写自己的查找函数(最好是效率的二进制搜索),在一次调用中提取n个样本。 This will give you a vector of indices (call it ix_vec) chosen with probability proportional to the Histogram count at each index. 这将为您提供一个索引向量(称为ix_vec),其选择概率与每个索引处的直方图计数成比例。

Then if we denote by X the numerical values corresponding to each location in the Histogram, your random sample is: 然后,如果我们用X表示直方图中每个位置对应的数值,那么随机样本是:

R1 = X(ix_vec);

Repeat for the second random sample set. 重复第二个随机样本集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM