简体   繁体   中英

Updtated titlle: Scipy.stats pdf bug?

I have a simple plot of a 2D Gaussian distribution.

from scipy.stats import multivariate_normal
from matplotlib import pyplot as plt

means = [ 1.03872615e+00, -2.66927843e-05]
cov_matrix =  [[3.88809050e-03, 3.90737359e-06], [3.90737359e-06, 4.28819569e-09]]

# This works
a_lims = [0.7, 1.3]
b_lims = [-5, 5]

# This does not work
a_lims = [0.700006488869478, 1.2849292618191401]
b_lims =[-5.000288311285968, 5.000099437047633]

dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[a_lims[0]:a_lims[1]:1e-2, b_lims[0]:b_lims[1]:0.1]
pos = np.empty(a_plot.shape + (2,))
pos[:, :, 0] = a_plot
pos[:, :, 1] = b_plot
z = dist.pdf(pos)

plt.figure()
plt.contourf(a_plot, b_plot, z, cmap='coolwarm',  levels=100)

If I use the limits marked as "this works", I get the following plot (correct).

在此处输入图片说明

However, if I use the same limits, but slightly adjusted, it plots completely wrong, because localized at different values (below).

在此处输入图片说明

I guess it is a bug in mgrid . Does anyone have any ideas? More specifically, why does the maximum of the distribution move?

Focusing just on the xaxis :

In [443]: a_lims = [0.7, 1.3] 
In [444]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]                                                   
Out[444]: 
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
       0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
       0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  , 1.01, 1.02,
       1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
       1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
       1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])
In [445]: a_lims = [0.700006488869478, 1.2849292618191401]                                     
In [446]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]                                                   
Out[446]: 
array([0.70000649, 0.71000649, 0.72000649, 0.73000649, 0.74000649,
       0.75000649, 0.76000649, 0.77000649, 0.78000649, 0.79000649,
       0.80000649, 0.81000649, 0.82000649, 0.83000649, 0.84000649,
       0.85000649, 0.86000649, 0.87000649, 0.88000649, 0.89000649,
       0.90000649, 0.91000649, 0.92000649, 0.93000649, 0.94000649,
       0.95000649, 0.96000649, 0.97000649, 0.98000649, 0.99000649,
       1.00000649, 1.01000649, 1.02000649, 1.03000649, 1.04000649,
       1.05000649, 1.06000649, 1.07000649, 1.08000649, 1.09000649,
       1.10000649, 1.11000649, 1.12000649, 1.13000649, 1.14000649,
       1.15000649, 1.16000649, 1.17000649, 1.18000649, 1.19000649,
       1.20000649, 1.21000649, 1.22000649, 1.23000649, 1.24000649,
       1.25000649, 1.26000649, 1.27000649, 1.28000649])
In [447]: _444.shape                                                                           
Out[447]: (61,)
In [449]: _446.shape                                                                           
Out[449]: (59,)

mgrid when given ranges like a:b:c uses np.arange(a, b, c) . arange when given float step is not reliable with regards to the end point.

mgrid lets you use np.linspace which is better for floating point steps. For example with the first set of limits:

In [453]: a_lims = [0.7, 1.3]                                                                  
In [454]: np.mgrid[a_lims[0]:a_lims[1]:61j]                                                    
Out[454]: 
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
       0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
       0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  , 1.01, 1.02,
       1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
       1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
       1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])

===

By narrowing the b_lims considerably, and generating a finer mesh, I get a nice tilted ellipse.

means = [ 1, 0]
a_lims = [0.7, 1.3]
b_lims = [-.0002,.0002]

dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[ a_lims[0]:a_lims[1]:1001j, b_lims[0]:b_lims[1]:1001j]

So I think the difference in your plots is an artifact of an excessively coarse mesh in the vertical direction. That potentially affects both the pdf generation and the contouring.

更高分辨率图

用原始网格点绘制

High resolution plot with original grid points. Only one b level intersects with the high probability values. Since the ellipse is tilted the two grids sample different parts, and hence the seemingly different pdfs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM