找到最常用的向量矩陣或模式 - Python / NumPy

Question

我有一個numpy形狀的數組（？，n）代表一個n維向量的向量。

我想找到最常見的排。

到目前為止，似乎最好的方法是迭代所有條目並存儲一個計數，但似乎淫穢numpy或scipy不會內置任何東西來執行此任務。

Answer 1

這是一種使用NumPy views的方法，它應該非常有效 -

def mode_rows(a):
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    _,ids, count = np.unique(a.view(void_dt).ravel(), \
                                return_index=1,return_counts=1)
    largest_count_id = ids[count.argmax()]
    most_frequent_row = a[largest_count_id]
    return most_frequent_row

樣品運行 -

In [45]: # Let's have a random arrayb with three rows(2,4,8) and two rows(1,7)
    ...: # being duplicated. Thus, the most freequent row must be 2 here.
    ...: a = np.random.randint(0,9,(9,5))
    ...: a[4] = a[8]
    ...: a[2] = a[4]
    ...: 
    ...: a[1] = a[7]
    ...: 

In [46]: a
Out[46]: 
array([[8, 8, 7, 0, 7],
       [7, 8, 2, 6, 1],
       [2, 2, 5, 7, 6],
       [6, 5, 8, 8, 5],
       [2, 2, 5, 7, 6],
       [5, 7, 3, 6, 3],
       [2, 8, 7, 2, 0],
       [7, 8, 2, 6, 1],
       [2, 2, 5, 7, 6]])

In [47]: mode_rows(a)
Out[47]: array([2, 2, 5, 7, 6])

Answer 2

numpy_indexed包（dsiclaimer：我是它的作者）具有完全相同的功能，適用於任意數量的維度：

import numpy_indexed as npi
row = npi.mode(arr)

在引擎蓋下，它就像Divakar在算法和復雜性方面的解決方案，還有更多的花里胡哨; 看'權重'和'return_indices'的錢包。

Answer 3

如果你能夠使用Pandas，這里有一種方法，它大大吸取了這個答案：

import numpy as np
import pandas as pd

# generate sample data
ncol = 5
nrow = 20000
matrix = np.random.randint(0,ncol,ncol*nrow).reshape(nrow,ncol)
df = pd.DataFrame(matrix)

df.head()
   0  1  2  3  4
0  3  0  4  4  4
1  4  0  0  2  0
2  3  3  2  0  0
3  0  3  4  3  3
4  1  1  3  3  3

# count duplicated rows
(df.groupby(df.columns.tolist())
   .size()
   .sort_values(ascending=False))

輸出：

0  1  2  3  4
4  2  2  1  1    17
2  2  4  2  3    16
3  2  1  2  2    15
   1  2  4  3    15
                 ..
4  1  3  0  1     1
1  2  3  0  4     1

最常見的行是此輸出的第一行。 頻率計數是最右邊的列。

找到最常用的向量矩陣或模式 - Python / NumPy

問題描述

3 個解決方案

解決方案1
3 已采納 2017-04-22 05:37:58

解決方案2
1 2017-04-22 07:42:54

解決方案3
0 2017-04-22 04:03:56

找到最常用的向量矩陣或模式 - Python / NumPy

問題描述

3 個解決方案

解決方案1 3 已采納 2017-04-22 05:37:58

解決方案2 1 2017-04-22 07:42:54

解決方案3 0 2017-04-22 04:03:56

解決方案1
3 已采納 2017-04-22 05:37:58

解決方案2
1 2017-04-22 07:42:54

解決方案3
0 2017-04-22 04:03:56