[英]Row or column wise most frequent elements in 2-D numpy array
I'm trying to find the most frequent elements in an two dimensional numpy array.我试图在二维 numpy 数组中找到最常见的元素。 I want them row-wise or column-wise.
我希望它们按行或按列。 I searched docs and web but I couldn't find exactly what I'm looking for.
我搜索了文档和 web 但我找不到我正在寻找的确切内容。 Let me explain with an example;
让我用一个例子来解释; assume I have an
arr
as following:假设我有如下
arr
:
import numpy as np
arr = np.random.randint(0, 2, size=(5, 2))
arr
# Output
array([[1, 1],
[0, 0],
[0, 1],
[1, 1],
[1, 0]])
The expected output is an array that contains the most frequent elements in columns or rows depending on the given axis
input.预期的 output 是一个数组,其中包含列或行中最频繁的元素,具体取决于给定的
axis
输入。 I know that np.unique() returns count of each unique value in the input array for given axis
.我知道np.unique()返回给定
axis
的输入数组中每个唯一值的计数。 So, it counts unique rows or columns in 2-D array:因此,它计算二维数组中的唯一行或列:
np.unique(arr, return_counts=True, axis=0)
# Output
(array([[0, 0],
[0, 1],
[1, 0],
[1, 1]]), array([1, 1, 1, 2]))
So, it tells that the unique elements [0, 0]
, [0, 1]
and [1, 0]
occur once whereas [1, 1]
occurs twice in the arr
.因此,它告诉唯一元素
[0, 0]
、 [0, 1]
和[1, 0]
出现一次,而[1, 1]
在arr
中出现两次。 This does not work for me.这对我不起作用。 Because I need to see the most frequent elements in rows (or columns).
因为我需要查看行(或列)中出现频率最高的元素。 So my expected output is as follows:
所以我预期的 output 如下:
array([[1, 1], # --> 1
[0, 0], # --> 0
[0, 1], # --> 0 or 1 since they have same frequency
[1, 1], # --> 1
[1, 0]]) # --> 0 or 1 since they have same frequency
Consequently, the result can be array([1, 0, 0, 1, 0])
or array([1, 0, 1, 1, 1])
with shape (5, )
.因此,结果可以是
array([1, 0, 0, 1, 0])
或array([1, 0, 1, 1, 1])
形状为(5, )
。
PS: PS:
I know that the solution can be found by iterating over columns or rows and finding most frequent elements using np.unique()
, however I want to find the most efficient way of doing this.我知道可以通过迭代列或行并使用
np.unique()
查找最常见的元素来找到解决方案,但是我想找到最有效的方法。 Because, generally numpy is used for vectorized calculations for huge sized arrays and in my case the input array arr
has too much elements.因为,通常 numpy 用于大型 arrays 的矢量化计算,在我的情况下,输入数组
arr
的元素太多。 The computation will be costly with a for loop.使用 for 循环的计算成本很高。
I am appreciated for each explanatory answer.我很感激每一个解释性的答案。
EDIT:编辑:
To be more clear, I added a loop based solution.为了更清楚,我添加了一个基于循环的解决方案。 Since the
arr
can contain not only 0s and 1s but also varying elements, I decided to use a different randomized arr
由于
arr
不仅可以包含 0 和 1,还可以包含不同的元素,因此我决定使用不同的随机arr
arr = np.random.randint(1, 4, size=(10, 3)) * 10
# arr:
array([[30, 30, 30],
[10, 20, 30],
[30, 30, 30],
[30, 10, 20],
[20, 20, 10],
[20, 30, 20],
[20, 30, 10],
[10, 30, 10],
[20, 10, 10],
[20, 30, 30]])
most_freq_elem_in_rows = []
for row in arr:
elements, counts = np.unique(row, return_counts=True)
most_freq_elem_in_rows.append(elements[np.argmax(counts)])
# most_freq_elem_in_rows:
# [30, 10, 30, 10, 20, 20, 10, 10, 10, 30]
most_freq_elem_in_cols = []
for col in arr.T:
elements, counts = np.unique(col, return_counts=True)
most_freq_elem_in_cols.append(elements[np.argmax(counts)])
# most_freq_elem_in_cols:
# [20, 30, 10]
Then, most_freq_elem_in_rows
and most_freq_elem_in_cols
can be converted numpy arrays simply using np.array()
然后,
most_freq_elem_in_rows
和most_freq_elem_in_cols
可以简单地使用np.array()
转换 numpy arrays
If you can add scipy dependency, then scipy.stats.mode achieves that:如果您可以添加 scipy 依赖项,则scipy.stats.mode可以实现:
import numpy as np
from scipy.stats import mode
arr = np.random.randint(0, 2, size=(5, 2))
mode(arr, 0)
ModeResult(mode=array([[0, 0]]), count=array([[3, 3]]))
mode(arr,1)
ModeResult(mode=array([[0],
[1],
[0],
[0],
[0]]),
count=array([[1],
[2],
[2],
[2],
[1]]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.