简体   繁体   English

python中分布的正态性检验

[英]normality test of a distribution in python

I have some data I have sampled from a radar satellite image and wanted to perform some statistical tests on. 我有一些数据,我从雷达卫星图像中采样,并希望对其进行一些统计测试。 Before this I wanted to conduct a normality test so I could be sure my data was normally distributed. 在此之前,我想进行常态测试,以确保我的数据是正常分布的。 My data appears to be normally distributed but when I perform the test Im getting a Pvalue of 0, suggesting my data is not normally distributed. 我的数据似乎是正常分布的,但是当我执行测试时,得到Pvalue为0,表明我的数据不是正常分布的。

I have attached my code along with the output and a histogram of the distribution (Im relatively new to python so apologies if my code is clunky in any way). 我已经附加了我的代码以及分布的输出和直方图(我对python相对较新,所以如果我的代码以任何方式笨拙而道歉)。 Can anyone tell me if Im doing something wrong - I find it hard to believe from my histogram that my data is not normally distributed? 谁能告诉我,如果我做错了什么 - 我发现我的直方图很难相信我的数据不是正常分布的?

values = 'inputfile.h5'
f = h5py.File(values,'r')
dset = f['/DATA/DATA']
array = dset[...,0]
print('normality =', scipy.stats.normaltest(array))
max = np.amax(array)
min = np.amin(array)

histo = np.histogram(array, bins=100, range=(min, max))
freqs = histo[0]
rangebins = (max - min)
numberbins = (len(histo[1])-1)
interval = (rangebins/numberbins)
newbins = np.arange((min), (max), interval)
histogram = bar(newbins, freqs, width=0.2, color='gray')
plt.show()

This prints this: (41099.095955202931, 0.0). 这打印出:(41099.095955202931,0.0)。 the first element is a chi-square value and the second is a pvalue. 第一个元素是卡方值,第二个元素是p值。

I have made a graph of the data which I have attached. 我已经附上了我所附数据的图表。 I thought that maybe as Im dealing with negative values it was causing a problem so I normalised the values but the problem persists. 我认为可能因为我正在处理负值而导致问题因此我将值标准化但问题仍然存在。

数组中值的直方图

This question explains why you're getting such a small p-value. 这个问题解释了为什么你得到这么小的p值。 Essentially, normality tests almost always reject the null on very large sample sizes (in yours, for example, you can see just some skew in the left side, which at your enormous sample size is way more than enough). 从本质上讲,正态性测试几乎总是在非常大的样本大小上拒绝空值(例如,在你的左侧,你可以看到左侧的一些偏斜,在你的巨大样本大小绰绰有余的情况下)。

What would be much more practically useful in your case is to plot a normal curve fit to your data. 在您的情况下,实际上更有用的是绘制适合您数据的正态曲线。 Then you can see how the normal curve actually differs (for example, you can see whether the tail on the left side does indeed go too long). 然后你可以看到正常曲线实际上是如何不同的(例如,你可以看到左侧的尾部是否确实变得太长)。 For example: 例如:

from matplotlib import pyplot as plt
import matplotlib.mlab as mlab

n, bins, patches = plt.hist(array, 50, normed=1)
mu = np.mean(array)
sigma = np.std(array)
plt.plot(bins, mlab.normpdf(bins, mu, sigma))

(Note the normed=1 argument: this ensures that the histogram is normalized to have a total area of 1, which makes it comparable to a density like the normal distribution). (注意normed=1参数:这可以确保将直方图标准化为总面积为1,这使其与正态分布的密度相当)。

In general when the number of samples is less than 50, you should be careful about using tests of normality. 通常,当样本数小于50时,您应该小心使用常态测试。 Since these tests need enough evidences to reject the null hypothesis, which is "the distribution of the data is normal", and when the number of samples is small they are not able to find those evidences. 由于这些测试需要足够的证据来拒绝零假设,即“数据的分布是正常的”,并且当样本数量很少时,他们无法找到那些证据。

Keep in mind that when you fail to reject the null hypothesis it does not mean that the alternative hypothesis is correct. 请记住,当您未能拒绝原假设时,并不意味着替代假设是正确的。

There is another possibility that: Some implementations of the statistical tests for normality compare the distribution of your data to standard normal distribution. 还有一种可能性:正常性统计检验的某些实现将数据分布与标准正态分布进行比较。 In order to avoid this, I suggest you to standardize the data and then apply the test of normality. 为了避免这种情况,我建议您对数据进行标准化,然后应用常态测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM