python將每個第3個值的字符串拆分成一個嵌套格式

Question

我有一個這樣的列表：

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

我希望它看起來像這樣

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

什么是最有效的方法呢？

編輯：走另一條路怎么樣？

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

- >

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

Answer 1

您可以使用簡單的列表理解來執行您想要的操作。

>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> [a[i:i+3] for i in range(0, len(a), 3)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

如果要填充最后一個子列表，可以在列表理解之前執行此操作：

>>> padding = 0
>>> a += [padding]*(3-len(a)%3)

將這些組合在一起成為一個功能：

def group(sequence, group_length, padding=None):
    if padding is not None:
        sequence += [padding]*(group_length-len(sequence)%group_length)
    return [sequence[i:i+group_length] for i in range(0, len(sequence), group_length)]

走另一條路：

def flatten(sequence):
    return [item for sublist in sequence for item in sublist]

>>> a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> flatten(a)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Answer 2

如果你可以使用numpy，試試x.reshape(-1, 3)

In [1]: import numpy as np
In [2]: x = ['a','b','c','d','e','f','g','h','i']
In [3]: x = np.array(x)
In [4]: x.reshape(-1, 3)
Out[4]: 
array([['a', 'b', 'c'],
       ['d', 'e', 'f'],
       ['g', 'h', 'i']], 
      dtype='|S1')

如果數據足夠大，則此代碼更有效。

更新

附加cProfile結果以解釋更有效

import cProfile
import numpy as np

a = range(10000000*3)

def impl_a():
    x = [a[i:i+3] for i in range(0, len(a), 3)]

def impl_b():
    x = np.array(a)
    x = x.reshape(-1, 3)

print("cProfile reuslt of impl_a()")
cProfile.run("impl_a()")
print("cProfile reuslt of impl_b()")
cProfile.run("impl_b()")

輸出是

cProfile reuslt of impl_a()
      5 function calls in 15.614 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.499    0.499   15.614   15.614 <string>:1(<module>)
     1   14.968   14.968   15.114   15.114 impla.py:6(impl_a)
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.146    0.146    0.146    0.146 {range}


cProfile reuslt of impl_b()
     5 function calls in 3.142 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    3.142    3.142 <string>:1(<module>)
     1    0.000    0.000    3.142    3.142 impla.py:9(impl_b)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    3.142    3.142    3.142    3.142 {numpy.core.multiarray.array}

Answer 3

您可以使用itertools的grouper配方和列表 itertools ：

from itertools import izip_longest # or zip_longest for Python 3.x

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args) # see note above

in_ = [1, 2, 3, 4, 5, 6, 7, 8, 9]

out = [list(t) for t in grouper(in_, 3)]

Answer 4

我的解決方案

>>> list=[1,2,3,4,5,6,7,8,9,10]
>>> map(lambda i: list[i:i+3], range(0,len(list),3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

Answer 5

使用itertools ，更具體地說，函數grouper提到的unter 食譜：

from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print [list(x) for x in grouper(a, 3)]

這打印

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Answer 6

我已經將所有已回答的方法運行到基准測試並找到最快的方法。

樣本量： 999999（1 <= x <= 258962）

Python： Python 2.7.5 | Anaconda 1.8.0（32位）（IPython）

操作系統： Windows 7 32位@酷睿i5 / 4GB內存

樣本生成代碼

import random as rd
lst = [rd.randrange(1,258963) for n in range(999999)]

來自@Scorpion_God的解決方案：

>>> %timeit x = [lst[i:i+3] for i in range(0, len(lst), 3)]
10 loops, best of 3: 114 ms per loop

來自@mskimm的解決方案：

>>>  %timeit array = np.array(lst)
10 loops, best of 3: 127 ms per loop
>>> %timeit array.reshape(-1,3)
1000000 loops, best of 3: 679 ns per loop

來自@jonrsharpe / @Carsten的解決方案：

>>> %timeit out = [list(t) for t in grouper(lst, 3)]
10 loops, best of 3: 158 ms per loop

所以，似乎在IPython（Anaconda）上， list-comprehension比itertools / izip_longest / grouper方法快30％左右

PS我認為，這個結果在CPython運行時會有所不同，我也希望補充一下。

python將每個第3個值的字符串拆分成一個嵌套格式

問題描述

6 個解決方案

解決方案1
12 已采納 2014-04-08 09:19:22

解決方案2
4 2014-04-08 09:24:21

解決方案3
3 2014-04-08 09:20:30

解決方案4
2 2014-04-08 09:22:10

解決方案5
1 2014-04-08 09:21:01

解決方案6
1 2014-04-09 13:58:37

樣本生成代碼

來自@Scorpion_God的解決方案：

來自@mskimm的解決方案：

來自@jonrsharpe / @Carsten的解決方案：

python將每個第3個值的字符串拆分成一個嵌套格式

問題描述

6 個解決方案

解決方案1 12 已采納 2014-04-08 09:19:22

解決方案2 4 2014-04-08 09:24:21

解決方案3 3 2014-04-08 09:20:30

解決方案4 2 2014-04-08 09:22:10

解決方案5 1 2014-04-08 09:21:01

解決方案6 1 2014-04-09 13:58:37

樣本生成代碼

來自@Scorpion_God的解決方案：

來自@mskimm的解決方案：

來自@jonrsharpe / @Carsten的解決方案：

解決方案1
12 已采納 2014-04-08 09:19:22

解決方案2
4 2014-04-08 09:24:21

解決方案3
3 2014-04-08 09:20:30

解決方案4
2 2014-04-08 09:22:10

解決方案5
1 2014-04-08 09:21:01

解決方案6
1 2014-04-09 13:58:37