简体   繁体   English

python中的有效张量收缩

[英]Efficient tensor contraction in python

I have a list L of tensors ( ndarray objects), with several indices each.我有一个张量列表Lndarray对象),每个都有几个索引。 I need to contract these indices according to a graph of connections.我需要根据连接图收缩这些指数。

The connections are encoded in a list of tuples in the form ((m,i),(n,j)) signifying "contract the i -th index of the tensor L[m] with the j -th index of the tensor L[n] .的连接进行编码元组的列表的形式((m,i),(n,j))表示“合同的张量的第i个指数L[m]与张力器的第j指数L[n] .

How can I handle non-trivial connectivity graphs?如何处理非平凡的连接图? The first problem is that as soon as I contract a pair of indices, the result is a new tensor that does not belong to the list L .第一个问题是,只要我收缩一对索引,结果就是一个不属于列表L的新张量。 But even if I solved this (eg by giving a unique identifier to all the indices of all the tensors), there is the issue that one can pick any order to perform the contractions, and some choices yield unnecessarily enormous beasts in mid-computation (even if the final result is small).但即使我解决了这个问题(例如通过为所有张量的所有索引提供唯一标识符),也存在一个问题,即可以选择任何顺序来执行收缩,并且某些选择会在中间计算中产生不必要的巨大野兽(即使最终结果很小)。 Suggestions?建议?

Memory considerations aside, I believe you can do the contractions in a single call to einsum , although you'll need some preprocessing.除了内存方面的考虑,我相信您可以在对einsum的一次调用中完成einsum ,尽管您需要一些预处理。 I'm not entirely sure what you mean by " as I contract a pair of indices, the result is a new tensor that does not belong to the list L ", but I think doing the contraction in a single step would exactly solve this problem.我不完全确定你所说的“当我收缩一对索引时,结果是一个不属于列表L的新张量什么意思,但我认为一步收缩就可以解决这个问题.

I suggest using the alternative, numerically indexed syntax of einsum :我建议使用einsum的替代数字索引语法:

einsum(op0, sublist0, op1, sublist1, ..., [sublistout])

So what you need to do is encode the indices to contract in integer sequences.所以你需要做的是将索引编码为整数序列。 First you'll need to set up a range of unique indices initially, and keep another copy to be used as sublistout .首先,您需要最初设置一系列唯一索引,并保留另一个副本以用作sublistout Then, iterating over your connectivity graph, you need to set contracted indices to the same index where necessary, and at the same time remove the contracted index from sublistout .然后,迭代您的连接图,您需要在必要时将收缩索引设置为相同的索引,同时从sublistout删除收缩索引。

import numpy as np

def contract_all(tensors,conns):
    '''
    Contract the tensors inside the list tensors
    according to the connectivities in conns

    Example input:
    tensors = [np.random.rand(2,3),np.random.rand(3,4,5),np.random.rand(3,4)]
    conns = [((0,1),(2,0)), ((1,1),(2,1))]
    returned shape in this case is (2,3,5)
    '''

    ndims = [t.ndim for t in tensors]
    totdims = sum(ndims)
    dims0 = np.arange(totdims)
    # keep track of sublistout throughout
    sublistout = set(dims0.tolist())
    # cut up the index array according to tensors
    # (throw away empty list at the end)
    inds = np.split(dims0,np.cumsum(ndims))[:-1]
    # we also need to convert to a list, otherwise einsum chokes
    inds = [ind.tolist() for ind in inds]

    # if there were no contractions, we'd call
    # np.einsum(*zip(tensors,inds),sublistout)

    # instead we need to loop over the connectivity graph
    # and manipulate the indices
    for (m,i),(n,j) in conns:
        # tensors[m][i] contracted with tensors[n][j]

        # remove the old indices from sublistout which is a set
        sublistout -= {inds[m][i],inds[n][j]}

        # contract the indices
        inds[n][j] = inds[m][i]

    # zip and flatten the tensors and indices
    args = [subarg for arg in zip(tensors,inds) for subarg in arg]

    # assuming there are no multiple contractions, we're done here
    return np.einsum(*args,sublistout)

A trivial example:一个简单的例子:

>>> tensors = [np.random.rand(2,3), np.random.rand(4,3)]
>>> conns = [((0,1),(1,1))]
>>> contract_all(tensors,conns)
array([[ 1.51970003,  1.06482209,  1.61478989,  1.86329518],
       [ 1.16334367,  0.60125945,  1.00275992,  1.43578448]])
>>> np.einsum('ij,kj',tensors[0],tensors[1])
array([[ 1.51970003,  1.06482209,  1.61478989,  1.86329518],
       [ 1.16334367,  0.60125945,  1.00275992,  1.43578448]])

In case there are multiple contractions, the logistics in the loop becomes a bit more complex, because we need to handle all the duplicates.如果有多个收缩,循环中的逻辑会变得有点复杂,因为我们需要处理所有重复项。 The logic, however, is the same.然而,逻辑是一样的。 Furthermore, the above is obviously missing checks to ensure that the corresponding indices can be contracted.此外,上述显然缺少确保可以收缩相应索引的检查。

In hindsight I realized that the default sublistout doesn't have to be specified, einsum uses that order anyway.事后看来,我意识到不必指定默认的sublistout ,无论如何einsum使用该顺序。 I decided to leave that variable in the code, because in case we want a non-trivial output index order, we'll have to handle that variable appropriately, and it might come handy.我决定在代码中保留该变量,因为如果我们想要一个非平凡的输出索引顺序,我们必须适当地处理该变量,它可能会派上用场。


As for optimization of the contraction order, you can effect internal optimization in np.einsum as of version 1.12 (as noted by @hpaulj in a now-deleted comment).至于收缩顺序的优化,您可以从 1.12 版开始在np.einsum实现内部优化(如@hpaulj 在现已删除的评论中所述)。 This version introduced the optimize optional keyword argument to np.einsum , allowing to choose a contraction order that cuts down on computational time at the cost of memory.这个版本向np.einsum引入了optimize可选关键字参数,允许选择一个收缩顺序,以内存为代价减少计算时间。 Passing 'greedy' or 'optimal' as the optimize keyword will make numpy choose a contraction order in roughly decreasing order of sizes of the dimensions.传递'greedy''optimal'作为optimize关键字将使 numpy 以维度大小的大致递减顺序选择收缩顺序。

The options available for the optimize keyword come from the apparently undocumented (as far as online documentation goes; help() fortunately works) function np.einsum_path :可用于optimize关键字的选项来自显然未记录的(就在线文档而言; help()幸运地工作)函数np.einsum_path

einsum_path(subscripts, *operands, optimize='greedy')

Evaluates the lowest cost contraction order for an einsum expression by
considering the creation of intermediate arrays.

The output contraction path from np.einsum_path can also be used as an input for the optimize argument of np.einsum . np.einsum_path的输出收缩路径也可以用作np.einsumoptimize参数的np.einsum In your question you were worried about too much memory being used, so I suspect that the default of no optimization (with potentially longer runtime and smaller memory footprint).在您的问题中,您担心使用了太多内存,所以我怀疑默认没有优化(运行时间可能更长,内存占用更小)。

Maybe helpful: Take a look into https://arxiv.org/abs/1402.0939 which is a description of an efficient framework for the problem of contracting so called tensor networks in a single function ncon(...) .也许有帮助:查看https://arxiv.org/abs/1402.0939 ,它描述了一个有效框架,用于在单个函数ncon(...)中收缩所谓的张量网络。 As far as I see implementations of it are directly available for Matlab (can be found within in the link) and for Python3 ( https://github.com/mhauru/ncon ).据我所知,它的实现可直接用于 Matlab(可在链接中找到)和 Python3( https://github.com/mhauru/ncon )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM