简体   繁体   English

更新标题:Scipy.stats pdf 错误?

[英]Updtated titlle: Scipy.stats pdf bug?

I have a simple plot of a 2D Gaussian distribution.我有一个简单的二维高斯分布图。

from scipy.stats import multivariate_normal
from matplotlib import pyplot as plt

means = [ 1.03872615e+00, -2.66927843e-05]
cov_matrix =  [[3.88809050e-03, 3.90737359e-06], [3.90737359e-06, 4.28819569e-09]]

# This works
a_lims = [0.7, 1.3]
b_lims = [-5, 5]

# This does not work
a_lims = [0.700006488869478, 1.2849292618191401]
b_lims =[-5.000288311285968, 5.000099437047633]

dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[a_lims[0]:a_lims[1]:1e-2, b_lims[0]:b_lims[1]:0.1]
pos = np.empty(a_plot.shape + (2,))
pos[:, :, 0] = a_plot
pos[:, :, 1] = b_plot
z = dist.pdf(pos)

plt.figure()
plt.contourf(a_plot, b_plot, z, cmap='coolwarm',  levels=100)

If I use the limits marked as "this works", I get the following plot (correct).如果我使用标记为“这有效”的限制,我会得到以下图(正确)。

在此处输入图片说明

However, if I use the same limits, but slightly adjusted, it plots completely wrong, because localized at different values (below).但是,如果我使用相同的限制,但稍作调整,它会绘制完全错误的图,因为定位于不同的值(如下)。

在此处输入图片说明

I guess it is a bug in mgrid .我猜这是mgrid一个错误。 Does anyone have any ideas?有没有人有任何想法? More specifically, why does the maximum of the distribution move?更具体地说,为什么分布的最大值会移动?

Focusing just on the xaxis :只关注xaxis

In [443]: a_lims = [0.7, 1.3] 
In [444]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]                                                   
Out[444]: 
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
       0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
       0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  , 1.01, 1.02,
       1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
       1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
       1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])
In [445]: a_lims = [0.700006488869478, 1.2849292618191401]                                     
In [446]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]                                                   
Out[446]: 
array([0.70000649, 0.71000649, 0.72000649, 0.73000649, 0.74000649,
       0.75000649, 0.76000649, 0.77000649, 0.78000649, 0.79000649,
       0.80000649, 0.81000649, 0.82000649, 0.83000649, 0.84000649,
       0.85000649, 0.86000649, 0.87000649, 0.88000649, 0.89000649,
       0.90000649, 0.91000649, 0.92000649, 0.93000649, 0.94000649,
       0.95000649, 0.96000649, 0.97000649, 0.98000649, 0.99000649,
       1.00000649, 1.01000649, 1.02000649, 1.03000649, 1.04000649,
       1.05000649, 1.06000649, 1.07000649, 1.08000649, 1.09000649,
       1.10000649, 1.11000649, 1.12000649, 1.13000649, 1.14000649,
       1.15000649, 1.16000649, 1.17000649, 1.18000649, 1.19000649,
       1.20000649, 1.21000649, 1.22000649, 1.23000649, 1.24000649,
       1.25000649, 1.26000649, 1.27000649, 1.28000649])
In [447]: _444.shape                                                                           
Out[447]: (61,)
In [449]: _446.shape                                                                           
Out[449]: (59,)

mgrid when given ranges like a:b:c uses np.arange(a, b, c) .当给定范围如a:b:c时, mgrid使用np.arange(a, b, c) arange when given float step is not reliable with regards to the end point. arange当给定的浮动步长就终点而言是不可靠的。

mgrid lets you use np.linspace which is better for floating point steps. mgrid允许您使用np.linspace ,它更适合浮点步骤。 For example with the first set of limits:例如第一组限制:

In [453]: a_lims = [0.7, 1.3]                                                                  
In [454]: np.mgrid[a_lims[0]:a_lims[1]:61j]                                                    
Out[454]: 
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
       0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
       0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  , 1.01, 1.02,
       1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
       1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
       1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])

=== ===

By narrowing the b_lims considerably, and generating a finer mesh, I get a nice tilted ellipse.通过显着缩小b_lims并生成更精细的网格,我得到了一个漂亮的倾斜椭圆。

means = [ 1, 0]
a_lims = [0.7, 1.3]
b_lims = [-.0002,.0002]

dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[ a_lims[0]:a_lims[1]:1001j, b_lims[0]:b_lims[1]:1001j]

So I think the difference in your plots is an artifact of an excessively coarse mesh in the vertical direction.所以我认为你的图中的差异是垂直方向过于粗糙的网格造成的。 That potentially affects both the pdf generation and the contouring.这可能会影响pdf生成和轮廓绘制。

更高分辨率图

用原始网格点绘制

High resolution plot with original grid points.具有原始网格点的高分辨率绘图。 Only one b level intersects with the high probability values.只有一个b级与高概率值相交。 Since the ellipse is tilted the two grids sample different parts, and hence the seemingly different pdfs.由于椭圆是倾斜的,因此两个网格采样不同的部分,因此看似不同的 pdf。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM