为列表链接多个索引操作的最快方法？

Question

我想对我拥有的一些数据进行分类，为此我想链接 python 列表的索引。 简化我有一个嵌套列表：

lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]

我想迭代前两个索引的乘积，但保持第三个相同：

from itertools import product

for index1, index2 in product(range(3), range(2)):
    print(lst[index1][index2][0])

但是，我想让这更通用，而无需事先知道这需要深入多少子结构（我想让range的数量传递给itertools.product变量）。

我正在努力如何概括[index1][index2][0]以接受任意数量的indices ，我能想到的最好的方法是functools.reduce ：

from functools import reduce

for indices in product(range(3), range(2)):
    print(reduce(list.__getitem__, indices, lst)[0])

这看起来非常复杂（并且比手动索引慢得多），所以我想知道是否有更好更快的方法来做到这一点。 我同时使用 python 2.x 和 3.x 并且外部库绝对没问题（但是它不应该需要NumPy或基于NumPy包）。

Answer 1

我提出了一种递归方式。

def theshape(lst):
    l=lst
    shape=[]
    while isinstance(l,list):
                shape.append(len(l))
                l=l[0]
    return shape

此函数旨在查找结构的形状，该形状在最后一维之前一直是规则的。

def browse(lst):
    shape=theshape(lst)
    ndim=len(shape)
    def level(l,k):
        if k==ndim:
            print(l)
        else:
            for i in range(shape[k]):
                level(l[i],k+1)
    level(lst,0)

这一个递归浏览所有级别。 它最大限度地减少了指针的变化。

一个简单的例子：

u=arange(2**6).reshape(4,2,1,2,2,1,1,2).tolist()
browse(u)
0
2
.
.
.
62

对大结构的一些测试（打印被print = lambda _ : None丢弃）：

def go(lst):
 for x in product(*[range(k) for k in theshape(lst)]):
    print(reduce(lambda result, index: result[index], x, lst))

In [1]: u=arange(2**21).reshape([2]*21).tolist()

In [2]: %time go(u)
Wall time: 14.8 s

In [3]: %time browse(u)
Wall time: 3.5 s

In [5]: u=arange(2**21).reshape([1]*30+[2**21]+[1]).tolist()

In [6]: %time go(u)
Wall time: 18 s

In [7]: %time browse(u)
Wall time: 3.48 s

In [8]: u=arange(2**21).reshape([1]+[2**21]+[1]*30).tolist()

In [9]: %time go(u)
Wall time: 14 s

In [10]: %time browse(u)
Wall time: 58.1 s

这表明性能非常依赖于数据结构。

编辑：

最后，最简单的就是最快的。 theshape不是必需的。

def browse2(lst):
        if isinstance(lst,list):
            for l in lst:
                browse2(l)
        else: print(lst)

它通常比浏览快 30%。 无论列表的结构如何，它都可以工作。

Answer 2

我会为此使用 python 内置的reduce ，它似乎并不复杂，在我的测试中也没有那么慢：

from itertools import product

for x in product(range(3), range(2)):
    rg = reduce(lambda result, index: result[index], x, lst)
    value = rg[0]

如果您担心reduce的时间损失，您可以只使用for循环：

for x in product(range(3), range(2)):
    value = lst
    for index in x:
        value = value[index]
    value = value[0]

在所有情况下，这都比手动索引慢，因为for循环需要额外的操作来确定停止条件。 与往常一样，问题是速度优化对您来说是否值得，因为任意深度规范的灵活性。

至于为什么要使用reduce与for ，JavaScript 社区内一直在争论是否应该在Array上使用reduce 、 map 、 filter函数，还是使用for循环版本，因为它更快，并且你可能想参考那场辩论来选择你站在哪一边。

使用 for 循环计时：

In [22]: stmt = '''
    ...: from itertools import product
    ...: def go():
    ...:   lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
    ...:   for x in product(range(3), range(2)):
    ...:     # rg = reduce(lambda result, index: result[index], x, lst)
    ...:     value = lst
    ...:     for index in x:
    ...:         value = value[index]
    ...:     value = value[0]
    ...:     # value = lst[x[0]][x[1]][0]
    ...: '''

In [23]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[23]: 4.003296852111816

定时reduce ：

In [18]: stmt = '''
    ...: from itertools import product
    ...: def go():
    ...:   lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
    ...:   for x in product(range(3), range(2)):
    ...:     rg = reduce(lambda result, index: result[index], x, lst)
    ...:     value = rg[0]
    ...:     # value = lst[x[0]][x[1]][0]
    ...: '''

In [19]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[19]: 6.164631128311157

手动索引计时：

In [16]: stmt = '''
    ...: from itertools import product
    ...: def go():
    ...:   lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
    ...:   for x in product(range(3), range(2)):
    ...:     # rg = reduce(lambda result, index: result[index], x, lst)
    ...:     value = lst[x[0]][x[1]][0]
    ...: '''

In [17]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[17]: 3.633723020553589

Answer 3

动态创建硬索引怎么样？

lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]

from itertools import product

for index1, index2 in product(range(3), range(2)):
    print(lst[index1][index2][0])


# need depth info from somewhere to create hard coded indexing

prod_gen = product(range(3), range(2))

first = next(prod_gen)

indx_depth = len(first) + 1

exec( ('def IndexThisList(lst, indxl):\n' +
       '        return lst' + ''.join(('[indxl[' + str(i) + ']]' 
                                           for i in range(indx_depth)))))

# just to see what it exec'd:
print(("def IndexThisList(lst, indx_itrbl):\n" +
       "        return lst" + ''.join(('[indx_itrbl[' + str(i) + ']]' 
                                       for i in range(indx_depth)))))
# the exec is only invoked again when changing the indexing depth
# for accessing the list with its currently instantiated depth of indexing
# just use the current instance of the generated function

print(IndexThisList(lst, first + (0,)))
for itpl in prod_gen: 
    print (IndexThisList(lst, itpl + (0,)))

1
2
3
4
5
6
def IndexThisList(lst, indx_itrbl):
        return lst[indx_itrbl[0]][indx_itrbl[1]][indx_itrbl[2]]
1
2
3
4
5
6

只是一个初学者，似乎我的 exec 应该用另一个函数包装来传递 index_depth 但它现在躲避我

为列表链接多个索引操作的最快方法？

问题描述

3 个解决方案

解决方案1
2 2017-01-30 22:27:23

解决方案2
1 已采纳 2017-01-30 20:42:59

解决方案3
1 2017-01-31 02:22:21

为列表链接多个索引操作的最快方法？

问题描述

3 个解决方案

解决方案1 2 2017-01-30 22:27:23

解决方案2 1 已采纳 2017-01-30 20:42:59

解决方案3 1 2017-01-31 02:22:21

解决方案1
2 2017-01-30 22:27:23

解决方案2
1 已采纳 2017-01-30 20:42:59

解决方案3
1 2017-01-31 02:22:21