如何在 python 中高效地制作大型稀疏矩陣？

Question

1. 我嘗試制作一個 numpy 數組，其形狀為：(6962341, 268148)，類型：np.uint8

2. 我的數據包括 [x1,x2,x3,x4], [x2,x1], [x4,x5,x3]...

3. 我想分配 array[x1,x2] += 1, array[x1,x3] += 1, array[x1,x4] += 1, array[x2,x3] += 1, ...

4.所以我嘗試了以下結構的function。

import numpy as np
from itertools import combinations

base_array = np.zeros((row_size, col_size), dtype=np.uint8))

for each_list in data:
  for (x,y) in list(combinations(each_list,2)):
    if x>y:
      base_array[y,x] += 1
    else:
      base_array[x,y] += 1

它基本上計算矩陣的上三角形，我將使用上三角形值。 您也可以認為這類似於為共現矩陣制作基矩陣 A。 但是這個 function 太慢了，我認為可以做得更快。 我應該怎么辦？

Answer 1

假設您的數據是整數（因為它們代表行和列），或者您可以 hash 您的數據x1, x2, ...轉換為1, 2, ...整數，這是一個快速的解決方案：

#list of pairwise combinations in your data
comb_list = []
for each_list in data:
  comb_list += list(combinations(each_list,2))

#convert combination int to index (numpy is 0 based indexing)
comb_list = np.array(comb_list) - 1

#make array with flat indices
flat = np.ravel_multi_index((comb_list[:,0],comb_list[:,1]),(row_size,col_size))

#count number of duplicates for each index using np.bincount
base_array = np.bincount(flat,None,row_size*col_size).reshape((row_size,col_size)).astype(np.uint8)

樣本數據：

[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]

對應output：

[[0 1 1 1 0]
 [1 0 1 1 0]
 [0 0 0 2 0]
 [0 0 1 1 1]
 [0 0 1 1 0]]

編輯：對應於評論中的解釋：

data=[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
base_array = np.zeros((len(data), np.max(np.amax(data))), dtype=np.uint8)

for i, each_list in enumerate(data):
  for j in each_list:
    base_array[i, j-1] = 1

Output：

[[1 1 1 1 0]
 [1 1 0 0 0]
 [0 0 1 1 1]]

如何在 python 中高效地制作大型稀疏矩陣？

問題描述

1 個解決方案

解決方案1
0 2020-04-17 12:37:12

如何在 python 中高效地制作大型稀疏矩陣？

問題描述

1 個解決方案

解決方案1 0 2020-04-17 12:37:12

解決方案1
0 2020-04-17 12:37:12