简体   繁体   English

numpy 阵列中的滤波元件具有一定的最小频率

[英]filtering elements in numpy array with a certain minimum frequency

I have an array of 5000 counts and need to do a chisquare test on it.我有一个包含 5000 个计数的数组,需要对其进行卡方检验。 However the test only works when the expected frequency is >5 for every value.然而,该测试仅在每个值的预期频率 > 5 时才有效。 I have found the frequency of each value in the dataset using collections.Counter(x) and I can see that there are some values with a frequency of 1 or 2. Now I would like to remove any value with a frequency<5 from my original dataset x but I don't know how to do this.我使用collections.Counter(x)找到了数据集中每个值的频率,我可以看到有一些频率为 1 或 2 的值。现在我想从我的原始数据集 x 但我不知道该怎么做。

Once I have removed these points, I need to create an expected poisson distribution to use in the chisquare test, but once again making sure that the expected frequency is >5.一旦我删除了这些点,我需要创建一个预期的泊松分布以用于卡方检验,但再次确保预期的频率大于 5。 I've made some distributions using stats.poisson.rvs but is there a way I can make sure that the frequency is always above 5?我已经使用stats.poisson.rvs进行了一些分布,但是有没有办法可以确保频率始终高于 5? Or would it be best to create the distribution, and go through the steps in the first part of my question?还是最好通过我的问题第一部分中的步骤创建发行版和 go?

One way to filter your array to certain frequency (eg >5 ) is ( a is your original array):将阵列过滤到特定频率(例如>5 )的一种方法是( a是您的原始阵列):

#this method assumes array a is consists of integers
a[np.in1d(a, np.where(np.bincount(a)>5)[0])]

Another way is:另一种方法是:

#works for non-integer arrays
values, counts = np.unique(a, return_counts=True)
a[np.in1d(a, values[counts>5])]

I would guess bincount solution is faster.我猜bincount解决方案更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM