将可变大小的子列表的嵌套列表展平为SciPy数组

Question

How can I use numpy/scipy to flatten a nested list with sublists of different sizes? 如何使用numpy / scipy来展平具有不同大小的子列表的嵌套列表？ Speed is very important and the lists are large. 速度非常重要，列表很大。

 lst = [[1, 2, 3, 4],[2, 3],[1, 2, 3, 4, 5],[4, 1, 2]]

Is anything faster than this? 有什么比这更快？

 vec = sp.array(list(*chain(lst)))

Answer 1

How about np.fromiter : 怎么样np.fromiter ：

In [49]: %timeit np.hstack(lst*1000)
10 loops, best of 3: 25.2 ms per loop

In [50]: %timeit np.array(list(chain.from_iterable(lst*1000)))
1000 loops, best of 3: 1.81 ms per loop

In [52]: %timeit np.fromiter(chain.from_iterable(lst*1000), dtype='int')
1000 loops, best of 3: 1 ms per loop

Answer 2

You can try numpy.hstack 你可以尝试numpy.hstack

>>> lst = [[1, 2, 3, 4],[2, 3],[1, 2, 3, 4, 5],[4, 1, 2]]
>>> np.hstack(lst)
array([1, 2, 3, 4, 2, 3, 1, 2, 3, 4, 5, 4, 1, 2])

Answer 3

The fastest way to create a numpy array from an iterator is to use numpy.fromiter : 从迭代器创建numpy数组的最快方法是使用numpy.fromiter ：

>>> %timeit numpy.fromiter(itertools.chain.from_iterable(lst), numpy.int64)
100000 loops, best of 3: 3.76 us per loop
>>> %timeit numpy.array(list(itertools.chain.from_iterable(lst)))
100000 loops, best of 3: 14.5 us per loop
>>> %timeit numpy.hstack(lst)
10000 loops, best of 3: 57.7 us per loop

As you can see, this is faster than converting to a list, and much faster than hstack . 如您所见，这比转换为列表更快，并且比hstack 。

Answer 4

尝试怎么样：

np.hstack(lst)

Answer 5

Use chain.from_iterable : 使用chain.from_iterable ：

vec = sp.array(list(chain.from_iterable(lst)))

This avoids using * which is quite expensive to handle if the iterable has many sublists. 这避免了使用* ，如果iterable具有许多子列表，则处理起来非常昂贵。

An other option might be to sum the lists: 另一种选择可能是对列表sum ：

vec = sp.array(sum(lst, []))

Note however that this will cause quadratic reallocation . 但请注意，这将导致二次重新分配。 Something like this performs much better: 像这样的东西表现得更好：

def sum_lists(lst):
    if len(lst) < 2:
        return sum(lst, [])
    else:
        half_length = len(lst) // 2
        return sum_lists(lst[:half_length]) + sum_lists(lst[half_length:])

On my machine I get: 在我的机器上，我得到：

>>> L = [[random.randint(0, 500) for _ in range(x)] for x in range(10, 510)]
>>> timeit.timeit('sum(L, [])', 'from __main__ import L', number=1000)
168.3029818534851
>>> timeit.timeit('sum_lists(L)', 'from __main__ import L,sum_lists', number=1000)
10.248489141464233
>>> 168.3029818534851 / 10.248489141464233
16.422223757114615

As you can see, a 16x speed-up. 如你所见，加速16倍。 The chain.from_iterable is even faster: chain.from_iterable甚至更快：

>>> timeit.timeit('list(itertools.chain.from_iterable(L))', 'import itertools; from __main__ import L', number=1000)
1.905594825744629
>>> 10.248489141464233 / 1.905594825744629
5.378105042586658

An other 6x speed-up. 另外6倍加速。

I looked for a "pure-python" solution, not knowing numpy. 我找了一个“纯python”解决方案，不知道numpy。 I believe ~~Abhijit~~ unutbu/senderle's solution is the way to go in your case. 我相信~~Abhijit~~ unutbu / senderle的解决方案是您的理由。

Answer 6

Use a function to flatten the list 使用函数展平列表

>>> flatten = lambda x: [y for l in x for y in flatten(l)] if type(x) is list else [x]
>>> flatten(lst)

将可变大小的子列表的嵌套列表展平为SciPy数组

问题描述

6 个解决方案

解决方案1
13 已采纳 2013-03-12 16:11:28

解决方案2
8 2013-03-12 16:07:59

解决方案3
5 2013-03-12 16:11:37

解决方案4
3 2013-03-12 16:08:22

解决方案5
1 2013-03-12 16:07:08

解决方案6
0 2016-05-19 02:28:39

将可变大小的子列表的嵌套列表展平为SciPy数组

问题描述

6 个解决方案

解决方案1 13 已采纳 2013-03-12 16:11:28

解决方案2 8 2013-03-12 16:07:59

解决方案3 5 2013-03-12 16:11:37

解决方案4 3 2013-03-12 16:08:22

解决方案5 1 2013-03-12 16:07:08

解决方案6 0 2016-05-19 02:28:39

解决方案1
13 已采纳 2013-03-12 16:11:28

解决方案2
8 2013-03-12 16:07:59

解决方案3
5 2013-03-12 16:11:37

解决方案4
3 2013-03-12 16:08:22

解决方案5
1 2013-03-12 16:07:08

解决方案6
0 2016-05-19 02:28:39