[英]How to normalize a histogram in MATLAB?
如何对直方图进行归一化,以使概率密度函数下的面积等于1?
My answer to this is the same as in an answer to your earlier question . 我对此的回答与您对先前问题的回答相同。 For a probability density function, the integral over the entire space is 1 .
对于概率密度函数, 整个空间的积分为1 。 Dividing by the sum will not give you the correct density.
除以总和不会得到正确的密度。 To get the right density, you must divide by the area.
为了获得正确的密度,必须除以面积。 To illustrate my point, try the following example.
为了说明我的观点,请尝试以下示例。
[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution
% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off
% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off
You can see for yourself which method agrees with the correct answer (red curve). 您可以自己查看哪种方法与正确答案(红色曲线)相符。
Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx)
which expresses the integral of the probability density function, ie 标准化直方图的另一种方法(比方法2更直接)是除以
sum(f * dx)
,它表示概率密度函数的积分,即
% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
Since 2014b, Matlab has these normalization routines embedded natively in the histogram
function (see the help file for the 6 routines this function offers). 自2014b起,Matlab将这些归一化例程本机嵌入在
histogram
函数中(有关此函数提供的6个例程,请参阅帮助文件 )。 Here is an example using the PDF normalization (the sum of all the bins is 1). 这是一个使用PDF归一化的示例(所有bin的总和为1)。
data = 2*randn(5000,1) + 5; % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf') % PDF normalization
The corresponding PDF is 对应的PDF是
Nbins = h.NumBins;
edges = h.BinEdges;
x = zeros(1,Nbins);
for counter=1:Nbins
midPointShift = abs(edges(counter)-edges(counter+1))/2;
x(counter) = edges(counter)+midPointShift;
end
mu = mean(data);
sigma = std(data);
f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
The two together gives 两者一起给
hold on;
plot(x,f,'LineWidth',1.5)
An improvement that might very well be due to the success of the actual question and accepted answer! 改进很可能归因于实际问题和接受的答案的成功!
EDIT - The use of hist
and histc
is not recommended now, and histogram
should be used instead. 编辑-现在不建议使用
hist
和histc
,而应使用histogram
。 Beware that none of the 6 ways of creating bins with this new function will produce the bins hist
and histc
produce. 请注意,使用此新功能创建垃圾箱的6种方法均不会产生垃圾箱
hist
和histc
垃圾箱。 There is a Matlab script to update former code to fit the way histogram
is called (bin edges instead of bin centers - link ). 有一个Matlab脚本可以更新以前的代码,以适应
histogram
的调用方式(bin边而不是bin中心-link )。 By doing so, one can compare the pdf
normalization methods of @abcd ( trapz
and sum
) and Matlab ( pdf
). 这样一来,可以比较 @abcd(
trapz
和sum
)和Matlab( pdf
) 的pdf
归一化方法 。
The 3 pdf
normalization method give nearly identical results (within the range of eps
) . 3
pdf
归一化方法给出的结果几乎相同(在eps
范围内) 。
TEST: 测试:
A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));
figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');
subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');
subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');
subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');
dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);
max(normalization_difference_trapz)
max(normalization_difference_sum)
The maximum difference between the new PDF normalization and the former one is 5.5511e-17. 新的PDF规范化与以前的规范化之间的最大差是5.5511e-17。
hist
can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar
. hist
不仅可以绘制直方图,还可以向您返回每个bin中的元素计数,因此您可以获取该计数,将每个bin除以总数,然后使用bar
绘制结果,将其标准化。 Example: 例:
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
or if you want a one-liner: 或者如果您想要单线:
bar(hist(Y) ./ sum(hist(Y)))
Edit: This solution answers the question How to have the sum of all bins equal to 1 . 编辑:此解决方案回答了问题: 如何使所有垃圾箱的总和等于1 。 This approximation is valid only if your bin size is small relative to the variance of your data.
仅当bin大小相对于数据方差较小时,这种近似才有效。 The sum used here correspond to a simple quadrature formula, more complex ones can be used like
trapz
as proposed by RM 此处使用的总和对应于一个简单的正交公式,可以使用更复杂的公式,例如RM建议的
trapz
[f,x]=hist(data)
The area for each individual bar is height*width. 每个单独的条的面积是高度*宽度。 Since MATLAB will choose equidistant points for the bars, so the width is:
由于MATLAB将为条形图选择等距点,因此宽度为:
delta_x = x(2) - x(1)
Now if we sum up all the individual bars the total area will come out as 现在,如果我们汇总所有单个条,则总面积将为
A=sum(f)*delta_x
So the correctly scaled plot is obtained by 因此,正确缩放的图可以通过
bar(x, f/sum(f)/(x(2)-x(1)))
The area of abcd`s PDF is not one, which is impossible like pointed out in many comments. abcd的PDF区域不全,就像许多评论所指出的那样,这是不可能的。 Assumptions done in many answers here
这里的许多答案中的假设
pdf
should be 1. The normalization should be done as Normalization
with probability
, not as Normalization
with pdf
, in histogram() and hist(). pdf
下的概率应为1。在histogram()和hist()中,归一化应以probability
进行Normalization
,而不是pdf
Normalization
。 Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach 图1 hist()方法的输出,图2 histogram()方法的输出
The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization. 两种方法之间的最大幅度不同,这表明hist()的方法存在一些错误,因为histogram()的方法使用标准归一化。 I assume the mistake with hist()'s approach here is about the normalization as partially
pdf
, not completely as probability
. 我认为这里hist()方法的错误是关于规范化的部分
pdf
,而不是完全的probability
。
Some remarks 一些评论
sum(f)/N
gives 1
if Nbins
manually set. Nbins
sum(f)/N
为1
。 dx
) in the graph g
g
bin的宽度( dx
) Code 码
%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND
%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off
Output is in Fig. 1. 输出如图1所示。
Some remarks 一些评论
sum(f)
is 1
if Nbins
adjusted with histogram()'s Normalization as probability, b) sum(f)/N
is 1 if Nbins
is manually set without normalization. sum(f)
是1
,如果Nbins
与直方图调整()的作为概率,B)标准化sum(f)/N
是1,如果Nbins
手动设置不正常化。 dx
) in the graph g
g
bin的宽度( dx
) Code 码
%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;
figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges;
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off
Output in Fig. 2 and expected output is met: area 1.0000. 图2中的输出和预期的输出均满足:面积1.0000。
Matlab: 2016a Matlab:2016a
System: Linux Ubuntu 16.04 64 bit 系统:Linux Ubuntu 16.04 64位
Linux kernel 4.6 Linux内核4.6
在MATLAB中有一个非常好的三部分直方图调整指南( 断开的原始链接 , archive.org链接 ),第一部分是直方图拉伸。
For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. Cauchy我认为,对于某些发行版,我发现trapz会高估该区域,因此pdf会根据您选择的bin数量而变化。 In which case I do
在这种情况下
[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.