[英]Memory-efficient sparse symmetric matrix calculations
I have to perform a large number of such calculations: 我必须执行大量此类计算:
X.dot(Y).dot(Xt)
X = 1 xn matrix X = 1 xn矩阵
Y = symmetric nxn matrix, each element one of 5 values (eg 0, 0.25, 0.5, 0.75, 1) Y =对称nxn矩阵,每个元素有5个值之一(例如0,0.25,0.5,0.75,1)
Xt = nx 1 matrix, transpose of X, ie X[np.newaxis].T
Xt = nx 1矩阵,X的转置,即
X[np.newaxis].T
X and Y are dense. X和Y密集。 The problem I have is for large n, I cannot store and use matrix Y due to memory issues.
我遇到的问题是大n,由于内存问题,我无法存储和使用矩阵Y. I am limited to using one machine, so distributed calculations are not an option.
我只能使用一台机器,因此不能选择分布式计算。
It occurred to me that Y has 2 features which theoretically can reduce the amount of memory required to store Y: 在我看来,Y有两个特征,理论上可以减少存储Y所需的内存量:
How can I implement this in practice? 我怎样才能在实践中实现这一点? I have looked up storage of symmetric matrices, but as far as I am aware all numpy matrix multiplications require "unpacking" the symmetry to produce a regular nxn matrix.
我已经查找了对称矩阵的存储,但据我所知,所有numpy矩阵乘法都需要“解包”对称性以产生规则的nxn矩阵。
I understand numpy is designed for in-memory calculations, so I'm open to alternative python-based solutions not reliant on numpy. 我知道numpy是为内存计算而设计的,所以我愿意接受基于python的替代解决方案而不依赖于numpy。 I'm also open to sacrificing speed for memory-efficiency.
我也愿意牺牲内存效率的速度。
UPDATE: I found using a format that crams 3 matrix elements into one byte is actually quite fast. 更新:我发现使用一种将3个矩阵元素压缩成一个字节的格式实际上非常快。 In the example below the speed penalty is less than
2x
compared to direct multiplication using @
while the space saving is more than 20x
. 在下面的示例中,与使用
@
直接乘法相比,速度损失小于2x
倍,而节省空间的时间超过20x
。
>>> Y = np.random.randint(0, 5, (3000, 3000), dtype = np.int8)
>>> i, j = np.triu_indices(3000, 1)
>>> Y[i, j] = Y[j, i]
>>> values = np.array([0.3, 0.5, 0.6, 0.9, 2.0])
>>> Ycmp = (np.reshape(Y, (1000, 3, 3000)) * np.array([25, 5, 1], dtype=np.int8)[None, :, None]).sum(axis=1, dtype=np.int8)
>>>
>>> full = values[Y]
>>> x @ full @ x
1972379.8153972814
>>>
>>> vtable = values[np.transpose(np.unravel_index(np.arange(125), (5,5,5)))]
>>> np.dot(np.concatenate([(vtable * np.bincount(row, x, minlength=125)[:, None]).sum(axis=0) for row in Ycmp]), x)
1972379.8153972814
>>>
>>> timeit('x @ full @ x', globals=globals(), number=100)
0.7130507210385986
>>> timeit('np.dot(np.concatenate([(vtable * np.bincount(row, x, minlength=125)[:, None]).sum(axis=0) for row in Ycmp]), x)', globals=globals(), number=100)
1.3755558349657804
The solutions below are slower and less memory efficient. 以下解决方案速度较慢,内存效率较低。 I'll leave them merely for reference.
我会留下它们仅供参考。
If you can afford half a byte per matrix element, then you can use np.bincount
like so: 如果你能为每个矩阵元素提供半个字节,那么你可以像这样使用
np.bincount
:
>>> Y = np.random.randint(0, 5, (1000, 1000), dtype = np.int8)
>>> i, j = np.triu_indices(1000, 1)
>>> Y[i, j] = Y[j, i]
>>> values = np.array([0.3, 0.5, 0.6, 0.9, 2.0])
>>> full = values[Y]
>>> # full would correspond to your original matrix,
>>> # Y is the 'compressed' version
>>>
>>> x = np.random.random((1000,))
>>>
>>> # direct method for reference
>>> x @ full @ x
217515.13954751115
>>>
>>> # memory saving version
>>> np.dot([(values * np.bincount(row, x)).sum() for row in Y], x)
217515.13954751118
>>>
>>> # to save another almost 50% exploit symmetry
>>> upper = Y[i, j]
>>> diag = np.diagonal(Y)
>>>
>>> boundaries = np.r_[0, np.cumsum(np.arange(999, 0, -1))]
>>> (values*np.bincount(diag, x*x)).sum() + 2 * np.dot([(values*np.bincount(upper[boundaries[i]:boundaries[i+1]], x[i+1:],minlength=5)).sum() for i in range(999)], x[:-1])
217515.13954751115
Each row of Y
, if represented as a numpy.array
of datatype int
as suggested in @PaulPanzer's answer, can be compressed to occupy less memory: In fact, you can store 27 elements with 64 bit, because 64 / log2(5) = 27.56... 的每一行
Y
,如果表示为numpy.array
数据类型的int
作为@ PaulPanzer的答案提示,可以压缩到占据较少的存储器:事实上,可以存储27个要素具有64位,因为64 / LOG2(5)= 27.56 ...
First, compress: 首先,压缩:
import numpy as np
row = np.random.randint(5, size=100)
# pad with zeros to length that is multiple of 27
if len(row)%27:
row_pad = np.append(row, np.zeros(27 - len(row)%27, dtype=int))
else:
row_pad = row
row_compr = []
y_compr = 0
for i, y in enumerate(row_pad):
if i > 0 and i % 27 == 0:
row_compr.append(y_compr)
y_compr = 0
y_compr *= 5
y_compr += y
# append last
row_compr.append(y_compr)
row_compr = np.array(row_compr, dtype=np.int64)
Then, decompress: 然后,解压缩:
row_decompr = []
for y_compr in row_compr:
y_block = np.zeros(shape=27, dtype=np.uint8)
for i in range(27):
y_block[26-i] = y_compr % 5
y_compr = int(y_compr // 5)
row_decompr.append(y_block)
row_decompr = np.array(row_decompr).flatten()[:len(row)]
The decompressed array coincides with the original row of Y
: 解压缩的数组与
Y
的原始行重合:
assert np.allclose(row, row_decompr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.