简体   繁体   English

Python 中的稀疏 3d 矩阵/数组?

[英]sparse 3d matrix/array in Python?

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.在 scipy 中,我们可以使用 scipy.sparse.lil_matrix() 等构造一个稀疏矩阵。但矩阵是二维的。

I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in Python?我想知道 Python 中是否存在用于稀疏 3d 矩阵/数组(张量)的现有数据结构?

ps I have lots of sparse data in 3d and need a tensor to store / perform multiplication. ps 我在 3d 中有很多稀疏数据,需要一个张量来存储/执行乘法。 Any suggestions to implement such a tensor if there's no existing data structure?如果没有现有的数据结构,有什么建议可以实现这样的张量?

Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.很高兴提出一个(可能很明显的)实现,如果您有时间和空间来创建新的依赖项,并且需要它更快,则可以使用纯 Python 或 C/Cython 实现。

A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples: N 维的稀疏矩阵可以假设大多数元素为空,因此我们使用以元组为键的字典:

class NDSparseMatrix:
  def __init__(self):
    self.elements = {}

  def addValue(self, tuple, value):
    self.elements[tuple] = value

  def readValue(self, tuple):
    try:
      value = self.elements[tuple]
    except KeyError:
      # could also be 0.0 if using floats...
      value = 0
    return value

and you would use it like so:你会像这样使用它:

sparse = NDSparseMatrix()
sparse.addValue((1,2,3), 15.7)
should_be_zero = sparse.readValue((1,5,13))

You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.您可以通过验证输入实际上是一个元组并且它只包含整数来使这个实现更加健壮,但这只会减慢速度,所以除非您稍后将代码发布给世界,否则我不会担心。

EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array ( numpy.ndarray ) might look like this:编辑- 矩阵乘法问题的 Cython 实现,假设其他张量是 N 维 NumPy 数组( numpy.ndarray )可能如下所示:

#cython: boundscheck=False
#cython: wraparound=False

cimport numpy as np

def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
  cdef unsigned int i, j, k

  out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)

  for i in xrange(1,u.shape[0]-1):
    for j in xrange(1, u.shape[1]-1):
      for k in xrange(1, u.shape[2]-1):
        # note, here you must define your own rank-3 multiplication rule, which
        # is, in general, nontrivial, especially if LxMxN tensor...

        # loop over a dummy variable (or two) and perform some summation:
        out[i,j,k] = u[i,j,k] * sparse((i,j,k))

  return out

Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!尽管您总是需要针对手头的问题手动滚动它,因为(如代码注释中所述)您需要定义要求和的索引,并注意数组长度或事情将不起作用!

EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:编辑 2 - 如果另一个矩阵也是稀疏的,那么你不需要做三路循环:

def sparse_mult(sparse, other_sparse):

  out = NDSparseMatrix()

  for key, value in sparse.elements.items():
    i, j, k = key
    # note, here you must define your own rank-3 multiplication rule, which
    # is, in general, nontrivial, especially if LxMxN tensor...

    # loop over a dummy variable (or two) and perform some summation 
    # (example indices shown):
    out.addValue(key) = out.readValue(key) + 
      other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))

  return out

My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:我对 C 实现的建议是使用一个简单的结构来保存索引和值:

typedef struct {
  int index[3];
  float value;
} entry_t;

you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need;然后,您将需要一些函数来分配和维护此类结构的动态数组,并根据需要尽快搜索它们; but you should test the Python implementation in place for performance before worrying about that stuff.但是你应该在担心这些东西之前测试 Python 实现的性能。

An alternative answer as of 2017 is the sparse package.截至 2017 年的替代答案是sparse包。 According to the package itself it implements sparse multidimensional arrays on top of NumPy and scipy.sparse by generalizing the scipy.sparse.coo_matrix layout.根据包本身,它通过概括scipy.sparse.coo_matrix布局在 NumPy 和scipy.sparse之上实现了稀疏多维数组。

Here's an example taken from the docs:这是从文档中获取的示例:

import numpy as np
n = 1000
ndims = 4
nnz = 1000000
coords = np.random.randint(0, n - 1, size=(ndims, nnz))
data = np.random.random(nnz)

import sparse
x = sparse.COO(coords, data, shape=((n,) * ndims))
x
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>

x.nbytes
# 16000000

y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))

y
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>

Have a look at sparray - sparse n-dimensional arrays in Python (by Jan Erik Solem).看看sparray - Python 中的稀疏 n 维数组(作者:Jan Erik Solem)。 Also available on github .也可以在github 上找到

Nicer than writing everything new from scratch may be to use scipy's sparse module as far as possible.比从头开始编写所有新内容更好的可能是尽可能使用 scipy 的 sparse 模块。 This may lead to (much) better performance.这可能会带来(很多)更好的性能。 I had a somewhat similar problem, but I only had to access the data efficiently, not perform any operations on them.我有一个有点类似的问题,但我只需要有效地访问数据,而不是对它们执行任何操作。 Furthermore, my data were only sparse in two out of three dimensions.此外,我的数据在三个维度中只有两个是稀疏的。

I have written a class that solves my problem and could (as far as I think) easily be extended to satisfiy the OP's needs.我编写了一个类来解决我的问题,并且可以(据我所知)轻松扩展以满足 OP 的需求。 It may still hold some potential for improvement, though.不过,它可能仍然有一些改进的潜力。

import scipy.sparse as sp
import numpy as np

class Sparse3D():
    """
    Class to store and access 3 dimensional sparse matrices efficiently
    """
    def __init__(self, *sparseMatrices):
        """
        Constructor
        Takes a stack of sparse 2D matrices with the same dimensions
        """
        self.data = sp.vstack(sparseMatrices, "dok")
        self.shape = (len(sparseMatrices), *sparseMatrices[0].shape)
        self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1])
        self._dim1 = np.arange(self.shape[0])
        self._dim2 = np.arange(self.shape[1])

    def __getitem__(self, pos):
        if not type(pos) == tuple:
            if not hasattr(pos, "__iter__") and not type(pos) == slice: 
                return self.data[self._dim1_jump[pos] + self._dim2]
            else:
                return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos]))
        elif len(pos) > 3:
            raise IndexError("too many indices for array")
        else:
            if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or
                not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice):
                if len(pos) == 2:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]]
                else:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T
                    if hasattr(pos[2], "__iter__") or type(pos[2]) == slice:
                        result = result.T
                return result
            else:
                if len(pos) == 2:
                    return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]]))
                else:
                    if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice:
                        return sp.vstack([self[self._dim1[pos[0]], i, pos[2]]
                                          for i in self._dim2[pos[1]]]).T
                    else:
                        return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]] 
                                          for i in self._dim1[pos[0]]))

    def toarray(self):
        return np.array([self[i].toarray() for i in range(self.shape[0])])

I also need 3D sparse matrix for solving the 2D heat equations (2 spatial dimensions are dense, but the time dimension is diagonal plus and minus one offdiagonal.) I found this link to guide me.我还需要 3D 稀疏矩阵来求解 2D 热方程(2 个空间维度是密集的,但时间维度是对角线加减一个非对角线。)我找到了这个链接来指导我。 The trick is to create an array Number that maps the 2D sparse matrix to a 1D linear vector.诀窍是创建一个数组Number将二维稀疏矩阵映射到一维线性向量。 Then build the 2D matrix by building a list of data and indices.然后通过构建数据和索引列表来构建 2D 矩阵。 Later the Number matrix is used to arrange the answer back to a 2D array.稍后使用Number矩阵将答案排列回二维数组。

[ edit ] It occurred to me after my initial post, this could be handled better by using the .reshape(-1) method. [编辑] 在我最初的帖子之后我想到了,这可以通过使用.reshape(-1)方法更好地处理。 After research, the reshape method is better than flatten because it returns a new view into the original array, but flatten copies the array.经过研究, reshape方法比flatten更好,因为它将新视图返回到原始数组中,但flatten复制数组。 The code uses the original Number array.该代码使用原始Number数组。 I will try to update later.[ end edit ]我稍后会尝试更新。[结束编辑]

I test it by creating a 1D random vector and solving for a second vector.我通过创建一个一维随机向量并求解第二个向量来测试它。 Then multiply it by the sparse 2D matrix and I get the same result.然后将它乘以稀疏二维矩阵,我得到相同的结果。

Note : I repeat this many times in a loop with exactly the same matrix M , so you might think it would be more efficient to solve for inverse( M ) .注意:我在一个循环中用完全相同的矩阵M重复了很多次,因此您可能认为求解inverse( M )会更有效。 But the inverse of M is not sparse, so I think (but have not tested) using spsolve is a better solution.但是M的逆不是稀疏的,所以我认为(但没有测试过)使用spsolve是一个更好的解决方案。 "Best" probably depends on how large the matrix is you are using. “最佳”可能取决于您使用的矩阵有多大。

#!/usr/bin/env python3
# testSparse.py
# profhuster

import numpy as np
import scipy.sparse as sM
import scipy.sparse.linalg as spLA
from array import array
from numpy.random import rand, seed
seed(101520)

nX = 4
nY = 3
r = 0.1

def loadSpNodes(nX, nY, r):
    # Matrix to map 2D array of nodes to 1D array
    Number = np.zeros((nY, nX), dtype=int)

    # Map each element of the 2D array to a 1D array
    iM = 0
    for i in range(nX):
        for j in range(nY):
            Number[j, i] = iM
            iM += 1
    print(f"Number = \n{Number}")

    # Now create a sparse matrix of the "stencil"
    diagVal = 1 + 4 * r
    offVal = -r
    d_list = array('f')
    i_list = array('i')
    j_list = array('i')
    # Loop over the 2D nodes matrix
    for i in range(nX):
        for j in range(nY):
            # Recall the 1D number
            iSparse = Number[j, i]
            # populate the diagonal
            d_list.append(diagVal)
            i_list.append(iSparse)
            j_list.append(iSparse)
            # Now, for each rectangular neighbor, add the 
            # off-diagonal entries
            # Use a try-except, so boundry nodes work
            for (jj,ii) in ((j+1,i),(j-1,i),(j,i+1),(j,i-1)):
                try:
                    iNeigh = Number[jj, ii]
                    if jj >= 0 and ii >=0:
                        d_list.append(offVal)
                        i_list.append(iSparse)
                        j_list.append(iNeigh)
                except IndexError:
                    pass
    spNodes = sM.coo_matrix((d_list, (i_list, j_list)), shape=(nX*nY,nX*nY))
    return spNodes


MySpNodes = loadSpNodes(nX, nY, r)
print(f"Sparse Nodes = \n{MySpNodes.toarray()}")
b = rand(nX*nY)
print(f"b=\n{b}")
x = spLA.spsolve(MySpNodes.tocsr(), b)
print(f"x=\n{x}")
print(f"Multiply back together=\n{x * MySpNodes}")

I needed a 3d look up table for x,y,z and came up with this solution..我需要一个 x、y、z 的 3d 查找表并想出了这个解决方案。
Why not use one of the dimensions to be a divisor of the third dimension?为什么不使用其中一个维度作为第三维度的除数呢? ie. IE。 use x and 'yz' as the matrix dimensions使用 x 和 'yz' 作为矩阵维度

eg. 例如。 if x has 80 potential members, y has 100 potential' and z has 20 potential' you make the sparse matrix to be 80 by 2000 (ie xy=100x20) 如果 x 有 80 个潜在成员,y 有 100 个潜在成员,而 z 有 20 个潜在成员,则稀疏矩阵为 80 x 2000(即 xy=100x20)
x dimension is as usual x 维度和往常一样
yz dimension: the first 100 elements will represent z=0, y=0 to 99 yz 维度:前 100 个元素将代表 z=0、y=0 到 99
..............the second 100 will represent z=2, y=0 to 99 etc ......第二个 100 将代表 z=2,y=0 到 99 等
so given element located at (x,y,z) would be in sparse matrix at (x, z*100 + y) 因此位于 (x,y,z) 处的给定元素将位于 (x, z*100 + y) 处的稀疏矩阵中
if you need to use negative numbers design a aritrary offset into your matrix translation. 如果您需要使用负数,请在矩阵转换中设计任意偏移量。 the solutio could be expanded to n dimensions if necessary 如有必要,该解决方案可以扩展到 n 维
from scipy import sparse m = sparse.lil_matrix((100,2000), dtype=float) def add_element((x,y,z), element): element=float(element) m[x,y+z*100]=element def get_element(x,y,z): return m[x,y+z*100] add_element([3,2,4],2.2) add_element([20,15,7], 1.2) print get_element(0,0,0) print get_element(3,2,4) print get_element(20,15,7) print " This is m sparse:";print m ==================== OUTPUT: 0.0 2.2 1.2 This is m sparse: (3, 402L) 2.2 (20, 715L) 1.2 ====================

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM