简体   繁体   English

MATLAB:使用不同的样本大小组合和标准化直方图

[英]MATLAB: combining and normalizing histograms with different sample sizes

I have four sets of data, the distribution of which I would like to represent in MATLAB in one figure. 我有四组数据,我希望在一个图中用MATLAB表示。 Current code is: 目前的代码是:

[n1,x1]=hist([dataset1{:}]);
[n2,x2]=hist([dataset2{:}]);
[n3,x3]=hist([dataset3{:}]);
[n4,x4]=hist([dataset4{:}]);
bar(x1,n1,'hist'); 
hold on; h1=bar(x1,n1,'hist'); set(h1,'facecolor','g')
hold on; h2=bar(x2,n2,'hist'); set(h2,'facecolor','g')
hold on; h3=bar(x3,n3,'hist'); set(h3,'facecolor','g')
hold on; h4=bar(x4,n4,'hist'); set(h4,'facecolor','g')
hold off 

My issue is that I have different sampling sizes for each group, dataset1 has an n of 69, dataset2 has an n of 23, dataset3 and dataset4 have n's of 10. So how do I normalize the distributions when representing these three groups together? 我的问题是我对每个组都有不同的采样大小,dataset1的n为69,dataset2的n为23,dataset3和dataset4的n为10.那么在将这三个组表示在一起时,如何规范化分布呢?

Is there some way to..for example..divide the instances in each bin by the sampling for that group? 有没有办法......例如..通过对该组的抽样来分割每个箱子中的实例?

You can normalize your histograms by dividing by the total number of elements: 您可以通过除以元素总数来标准化直方图:

[n1,x1] = histcounts(randn(69,1));
[n2,x2] = histcounts(randn(23,1));
[n3,x3] = histcounts(randn(10,1));
[n4,x4] = histcounts(randn(10,1));
hold on
bar(x4(1:end-1),n4./sum(n4),'histc');
bar(x3(1:end-1),n3./sum(n3),'histc');
bar(x2(1:end-1),n2./sum(n2),'histc');
bar(x1(1:end-1),n1./sum(n1),'histc');
hold off 
ax = gca;
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))

However, as you can see above, you can do some more things to make your code more simple and short: 但是,正如您在上面所看到的,您可以做更多的事情来使您的代码更简单和简短:

  1. You only need to hold on once. 你只需hold on一次。
  2. Instead of collecting all the bar handles, use the axes handle. 不使用收集所有bar手柄,而是使用axes手柄。
  3. Plot the bar in ascending order of the number of elements in the dataset, so all histograms will be clearly visible. 以数据集中元素数量的升序绘制条形图,因此所有直方图都将清晰可见。
  4. With the axes handle set all properties at one command. 使用axes手柄在一个命令中设置所有属性。

and as a side note - it's better to use histcounts . 作为旁注 - 最好使用histcounts

Here is the result: 结果如下:

只有组织


EDIT: 编辑:

If you want to also plot the pdf line from histfit , then you can save it first, and then plot it normalized: 如果你还要从histfit绘制pdf线,那么你可以先保存它,然后将其标准化:

dataset = {randn(69,1),randn(23,1),randn(10,1),randn(10,1)};
fits = zeros(100,2,numel(dataset));
hold on
for k = numel(dataset):-1:1
    total = numel(dataset{k}); % for normalizing
    f = histfit(dataset{k}); % draw the histogram and fit
    % collect the curve data and normalize it:
    fits(:,:,k) = [f(2).XData; f(2).YData./total].';
    x = f(1).XData; % collect the bar positions
    n = f(1).YData; % collect the bar counts
    f.delete % delete the histogram and the fit
    bar(x,n./total,'histc'); % plot the bar
end
ax = gca; % get the axis handle
% set all color and transparency for the bars:
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))
% plot all the curves:
plot(squeeze(fits(:,1,:)),squeeze(fits(:,2,:)),'LineWidth',3)
hold off

Again, there are some other improvements you can introduce to your code: 同样,您可以在代码中引入一些其他改进:

  1. Put everything in a loop to make thigs more easily changed later. 将所有内容放在一个循环中,以便以后更容易更改。
  2. Collect all the curves data to one variable so you can plot them all together very easily. 将所有曲线数据收集到一个变量中,以便您可以非常轻松地将它们全部绘制在一起。

The new result is: 新结果是:

hist&fit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM