二维 numpy 数组中的行或列最常见元素

Question

I'm trying to find the most frequent elements in an two dimensional numpy array.我试图在二维 numpy 数组中找到最常见的元素。 I want them row-wise or column-wise.我希望它们按行或按列。 I searched docs and web but I couldn't find exactly what I'm looking for.我搜索了文档和 web 但我找不到我正在寻找的确切内容。 Let me explain with an example;让我用一个例子来解释； assume I have an arr as following:假设我有如下arr ：

import numpy as np
arr = np.random.randint(0, 2, size=(5, 2))
arr

# Output
array([[1, 1],
       [0, 0],
       [0, 1],
       [1, 1],
       [1, 0]])

The expected output is an array that contains the most frequent elements in columns or rows depending on the given axis input.预期的 output 是一个数组，其中包含列或行中最频繁的元素，具体取决于给定的axis输入。 I know that np.unique() returns count of each unique value in the input array for given axis .我知道np.unique()返回给定axis的输入数组中每个唯一值的计数。 So, it counts unique rows or columns in 2-D array:因此，它计算二维数组中的唯一行或列：

np.unique(arr, return_counts=True, axis=0)

# Output
(array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]]), array([1, 1, 1, 2]))

So, it tells that the unique elements [0, 0] , [0, 1] and [1, 0] occur once whereas [1, 1] occurs twice in the arr .因此，它告诉唯一元素[0, 0] 、 [0, 1]和[1, 0]出现一次，而[1, 1]在arr中出现两次。 This does not work for me.这对我不起作用。 Because I need to see the most frequent elements in rows (or columns).因为我需要查看行（或列）中出现频率最高的元素。 So my expected output is as follows:所以我预期的 output 如下：

array([[1, 1],    # --> 1
       [0, 0],    # --> 0
       [0, 1],    # --> 0 or 1 since they have same frequency
       [1, 1],    # --> 1
       [1, 0]])   # --> 0 or 1 since they have same frequency

Consequently, the result can be array([1, 0, 0, 1, 0]) or array([1, 0, 1, 1, 1]) with shape (5, ) .因此，结果可以是array([1, 0, 0, 1, 0])或array([1, 0, 1, 1, 1])形状为(5, ) 。

PS: PS：

I know that the solution can be found by iterating over columns or rows and finding most frequent elements using np.unique() , however I want to find the most efficient way of doing this.我知道可以通过迭代列或行并使用np.unique()查找最常见的元素来找到解决方案，但是我想找到最有效的方法。 Because, generally numpy is used for vectorized calculations for huge sized arrays and in my case the input array arr has too much elements.因为，通常 numpy 用于大型 arrays 的矢量化计算，在我的情况下，输入数组arr的元素太多。 The computation will be costly with a for loop.使用 for 循环的计算成本很高。

I am appreciated for each explanatory answer.我很感激每一个解释性的答案。

EDIT:编辑：

To be more clear, I added a loop based solution.为了更清楚，我添加了一个基于循环的解决方案。 Since the arr can contain not only 0s and 1s but also varying elements, I decided to use a different randomized arr由于arr不仅可以包含 0 和 1，还可以包含不同的元素，因此我决定使用不同的随机arr

arr = np.random.randint(1, 4, size=(10, 3)) * 10

# arr:
array([[30, 30, 30],
       [10, 20, 30],
       [30, 30, 30],
       [30, 10, 20],
       [20, 20, 10],
       [20, 30, 20],
       [20, 30, 10],
       [10, 30, 10],
       [20, 10, 10],
       [20, 30, 30]])

most_freq_elem_in_rows = []
for row in arr:
  elements, counts = np.unique(row, return_counts=True)
  most_freq_elem_in_rows.append(elements[np.argmax(counts)])

# most_freq_elem_in_rows:
# [30, 10, 30, 10, 20, 20, 10, 10, 10, 30]

most_freq_elem_in_cols = []
for col in arr.T:
  elements, counts = np.unique(col, return_counts=True)
  most_freq_elem_in_cols.append(elements[np.argmax(counts)])

# most_freq_elem_in_cols:
# [20, 30, 10]

Then, most_freq_elem_in_rows and most_freq_elem_in_cols can be converted numpy arrays simply using np.array()然后， most_freq_elem_in_rows和most_freq_elem_in_cols可以简单地使用np.array()转换 numpy arrays

Answer 1

If you can add scipy dependency, then scipy.stats.mode achieves that:如果您可以添加 scipy 依赖项，则scipy.stats.mode可以实现：

import numpy as np
from scipy.stats import mode

arr = np.random.randint(0, 2, size=(5, 2))

mode(arr, 0)
ModeResult(mode=array([[0, 0]]), count=array([[3, 3]]))

mode(arr,1)
ModeResult(mode=array([[0],
                       [1], 
                       [0],
                       [0],
                       [0]]), 
           count=array([[1],
                        [2],
                        [2],
                        [2],
                        [1]]))

二维 numpy 数组中的行或列最常见元素

问题描述

1 个解决方案

解决方案1
1 2020-05-10 13:19:13

二维 numpy 数组中的行或列最常见元素

问题描述

1 个解决方案

解决方案1 1 2020-05-10 13:19:13

解决方案1
1 2020-05-10 13:19:13