简体   繁体   English

Python 将列表拆分为 n 个块

[英]Python split list into n chunks

I know this question has been covered many times but my requirement is different.我知道这个问题已被多次提及,但我的要求有所不同。

I have a list like: range(1, 26) .我有一个列表: range(1, 26) I want to divide this list into a fixed number n .我想把这个列表分成一个固定的数字n Assuming n = 6.假设 n = 6。

>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> l = [ x [i:i + 6] for i in range(0, len(x), 6) ]
>>> l
[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]

As you can see I didn't get 6 chunks (six sublists with elements of original list).如您所见,我没有得到 6 个块(六个带有原始列表元素的子列表)。 How do I divide a list in such a way that I get exactly n chunks which may be even or uneven如何以这样的方式划分列表,使我得到恰好n块,这些块可能是均匀的或不均匀的

Use numpy使用 numpy

>>> import numpy
>>> x = range(25)
>>> l = numpy.array_split(numpy.array(x),6)

or或者

>>> import numpy
>>> x = numpy.arange(25)
>>> l = numpy.array_split(x,6);

You can also use numpy.split but that one throws in error if the length is not exactly divisible.您也可以使用 numpy.split 但如果长度不能完全整除,则会出错。

If order doesn't matter:如果顺序无关紧要:

def chunker_list(seq, size):
    return (seq[i::size] for i in range(size))

print(list(chunker_list([1, 2, 3, 4, 5], 2)))
>>> [[1, 3, 5], [2, 4]]

print(list(chunker_list([1, 2, 3, 4, 5], 3)))
>>> [[1, 4], [2, 5], [3]]

print(list(chunker_list([1, 2, 3, 4, 5], 4)))
>>> [[1, 5], [2], [3], [4]]

print(list(chunker_list([1, 2, 3, 4, 5], 5)))
>>> [[1], [2], [3], [4], [5]]

print(list(chunker_list([1, 2, 3, 4, 5], 6)))
>>> [[1], [2], [3], [4], [5], []]

The solution(s) below have many advantages:下面的解决方案有很多优点:

  • Uses generator to yield the result.使用生成器产生结果。
  • No imports.没有进口。
  • Lists are balanced (you never end up with 4 lists of size 4 and one list of size 1 if you split a list of length 17 into 5).列表是平衡的(如果将长度为 17 的列表拆分为 5,则永远不会得到 4 个大小为 4 的列表和一个大小为 1 的列表)。
def chunks(l, n):
    """Yield n number of striped chunks from l."""
    for i in range(0, n):
        yield l[i::n]

The code above produces the below output for l = range(16) and n = 6 :上面的代码为l = range(16)n = 6生成以下输出:

[0, 6, 12]
[1, 7, 13]
[2, 8, 14]
[3, 9, 15]
[4, 10]
[5, 11]

If you need the chunks to be sequential instead of striped use this:如果您需要将块按顺序排列而不是条带化,请使用以下命令:

def chunks(l, n):
    """Yield n number of sequential chunks from l."""
    d, r = divmod(len(l), n)
    for i in range(n):
        si = (d+1)*(i if i < r else r) + d*(0 if i < r else i - r)
        yield l[si:si+(d+1 if i < r else d)]

Which for l = range(16) and n = 6 produces:其中l = range(16)n = 6产生:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10, 11]
[12, 13]
[14, 15]

See this stackoverflow link for more information on the advantages of generators.有关生成器优势的更多信息,请参阅此 stackoverflow 链接

more_itertools.divide is one approach to solve this problem: more_itertools.divide是解决这个问题的一种方法:

import more_itertools as mit


iterable = range(1, 26)
[list(c) for c in mit.divide(6, iterable)]

Output输出

[[ 1,  2,  3,  4, 5],                       # remaining item
 [ 6,  7,  8,  9],
 [10, 11, 12, 13],
 [14, 15, 16, 17],
 [18, 19, 20, 21],
 [22, 23, 24, 25]]

As shown, if the iterable is not evenly divisible, the remaining items are distributed from the first to the last chunk.如图所示,如果 iterable 不能被均匀整除,则剩余的项目从第一个块到最后一个块分布。

See more about the more_itertools library here .在此处查看有关more_itertools库的更多信息。

My answer is to simply use python built-in Slice:我的答案是简单地使用python内置的Slice:

# Assume x is our list which we wish to slice
x = range(1, 26)
# Assume we want to slice it to 6 equal chunks
result = []
for i in range(0, len(x), 6):
    slice_item = slice(i, i + 6, 1)
    result.append(x[slice_item])

# Result would be equal to 

[[0,1,2,3,4,5], [6,7,8,9,10,11], [12,13,14,15,16,17],[18,19,20,21,22,23], [24, 25]] [[0,1,2,3,4,5], [6,7,8,9,10,11], [12,13,14,15,16,17],[18,19,20, 21,22,23], [24, 25]]

I came up with the following solution:我想出了以下解决方案:

l = [x[i::n] for i in range(n)]

For example:例如:

n = 6
x = list(range(26))

l = [x[i::n] for i in range(n)]
print(l)

Output: Output:

[[0, 6, 12, 18, 24], [1, 7, 13, 19, 25], [2, 8, 14, 20], [3, 9, 15, 21], [4, 10, 16, 22], [5, 11, 17, 23]]

As you can see, the output consists from n chunks, which have roughly the same number of elements.如您所见,output 由n块组成,它们具有大致相同数量的元素。


How it works?怎么运行的?

The trick is to use list slice step (the number after two semicolons) and to increment the offset of stepped slicing .诀窍是使用 list slice step (两个分号后的数字)并增加 steped slicing 的偏移量 First, it takes every n element starting from the first, then every n element starting from the second and so on.首先,它从第一个开始获取每个n元素,然后从第二个开始获取每个n元素,依此类推。 This completes the task.这样就完成了任务。

Try this:尝试这个:

from __future__ import division

import math

def chunked(iterable, n):
    """ Split iterable into ``n`` iterables of similar size

    Examples::
        >>> l = [1, 2, 3, 4]
        >>> list(chunked(l, 4))
        [[1], [2], [3], [4]]

        >>> l = [1, 2, 3]
        >>> list(chunked(l, 4))
        [[1], [2], [3], []]

        >>> l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        >>> list(chunked(l, 4))
        [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

    """
    chunksize = int(math.ceil(len(iterable) / n))
    return (iterable[i * chunksize:i * chunksize + chunksize]
            for i in range(n))

It returns an iterator instead of a list for efficiency (I'm assuming you want to loop over the chunks), but you can replace that with a list comprehension if you want.它返回一个迭代器而不是一个列表以提高效率(我假设您想遍历块),但是如果您愿意,您可以将其替换为列表理解。 When the number of items is not divisible by number of chunks, the last chunk is smaller than the others.当项目数不能被块数整除时,最后一个块小于其他块。

EDIT: Fixed second example to show that it doesn't handle one edge case编辑:修复了第二个示例以表明它不处理一个边缘情况

Here take my 2 cents..拿走我的 2 美分..

from math import ceil

size = 3
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

chunks = [
    seq[i * size:(i * size) + size]
    for i in range(ceil(len(seq) / size))
]

# [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11]]

One way would be to make the last list uneven and the rest even.一种方法是使最后一个列表不均匀,其余列表均匀。 This can be done as follows:这可以按如下方式完成:

>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> m = len(x) // 6
>>> test = [x[i:i+m] for i in range(0, len(x), m)]
>>> test[-2:] = [test[-2] + test[-1]]
>>> test
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]

Assuming you want to divide into n chunks:假设你想分成n个块:

n = 6
num = float(len(x))/n
l = [ x [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(x[(n-1)*int(num):])

This method simply divides the length of the list by the number of chunks and, in case the length is not a multiple of the number, adds the extra elements in the last list.此方法简单地将列表的长度除以块的数量,如果长度不是数字的倍数,则在最后一个列表中添加额外的元素。

If you want to have the chunks as evenly sized as possible:如果您想让块的大小尽可能均匀:

def chunk_ranges(items: int, chunks: int) -> List[Tuple[int, int]]:
    """
    Split the items by best effort into equally-sized chunks.
    
    If there are fewer items than chunks, each chunk contains an item and 
    there are fewer returned chunk indices than the argument `chunks`.

    :param items: number of items in the batch.
    :param chunks: number of chunks
    :return: list of (chunk begin inclusive, chunk end exclusive)
    """
    assert chunks > 0, \
        "Unexpected non-positive chunk count: {}".format(chunks)

    result = []  # type: List[Tuple[int, int]]
    if items <= chunks:
        for i in range(0, items):
            result.append((i, i + 1))
        return result

    chunk_size, extras = divmod(items, chunks)

    start = 0
    for i in range(0, chunks):
        if i < extras:
            end = start + chunk_size + 1
        else:
            end = start + chunk_size

        result.append((start, end))
        start = end

    return result

Test case:测试用例:

def test_chunk_ranges(self):
    self.assertListEqual(chunk_ranges(items=8, chunks=1),
                         [(0, 8)])

    self.assertListEqual(chunk_ranges(items=8, chunks=2),
                         [(0, 4), (4, 8)])

    self.assertListEqual(chunk_ranges(items=8, chunks=3),
                         [(0, 3), (3, 6), (6, 8)])

    self.assertListEqual(chunk_ranges(items=8, chunks=5),
                         [(0, 2), (2, 4), (4, 6), (6, 7), (7, 8)])

    self.assertListEqual(chunk_ranges(items=8, chunks=6),
                         [(0, 2), (2, 4), (4, 5), (5, 6), (6, 7), (7, 8)])

    self.assertListEqual(
        chunk_ranges(items=8, chunks=7),
        [(0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])

    self.assertListEqual(
        chunk_ranges(items=8, chunks=9),
        [(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])

Hint:暗示:

  • x is the string to be split. x 是要拆分的字符串。
  • k is number of chunks k 是块的数量

    n = len(x)/k [x[i:i+n] for i in range(0, len(x), n)]

You can just take the excess and append it to the last list, unless I am misunderstanding the question: 您可以将多余的部分添加到最后一个列表中,除非我对这个问题有误解:

import pprint
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

def group(lst, div):
    lst = [ lst[i:i + len(lst)/div] for i in range(0, len(lst), len(lst)/div) ] #Subdivide list.
    if len(lst) > div: # If it is an uneven list.
        lst[div-1].extend(sum(lst[div:],[])) # Take the last part of the list and append it to the last equal division.
    return lst[:div] #Return the list up to that point.

l = group(x, 6)

pprint.pprint(l)

Prints: 印刷品:

[[1, 2, 3, 4],
 [5, 6, 7, 8],
 [9, 10, 11, 12],
 [13, 14, 15, 16],
 [17, 18, 19, 20],
 [21, 22, 23, 24, 25]]

Note: You can use faster methods than sum(l, []) to flatten a list, but for brevity I am using that. 注意:您可以使用比sum(l, [])更快的方法来展平列表,但是为了简便起见,我使用了该方法。

In cases, where your list contains elements of different types or iterable objects that store values of different types (fe some elements are integers, and some are strings), if you use array_split function from numpy package to split it, you will get chunks with elements of same type:如果您的列表包含不同类型的元素或存储不同类型值的可迭代对象(例如有些元素是整数,有些是字符串),如果您使用numpy包中的array_split函数来拆分它,您将获得块相同类型的元素:

import numpy as np

data1 = [(1, 2), ('a', 'b'), (3, 4), (5, 6), ('c', 'd'), ('e', 'f')]
chunks = np.array_split(data1, 3)
print(chunks)
# [array([['1', '2'],
#        ['a', 'b']], dtype='<U11'), array([['3', '4'],
#        ['5', '6']], dtype='<U11'), array([['c', 'd'],
#        ['e', 'f']], dtype='<U11')]

data2 = [1, 2, 'a', 'b', 3, 4, 5, 6, 'c', 'd', 'e', 'f']
chunks = np.array_split(data2, 3)
print(chunks)
# [array(['1', '2', 'a', 'b'], dtype='<U11'), array(['3', '4', '5', '6'], dtype='<U11'),
#  array(['c', 'd', 'e', 'f'], dtype='<U11')]

If you would like to have initial types of elements in chunks after splitting of list, you can modify source code of array_split function from numpy package or use this implementation :如果您希望在拆分列表后在块中拥有初始类型的元素,您可以修改numpy包中的array_split函数的源代码或使用此实现

from itertools import accumulate

def list_split(input_list, num_of_chunks):
    n_total = len(input_list)
    n_each_chunk, extras = divmod(n_total, num_of_chunks)
    chunk_sizes = ([0] + extras * [n_each_chunk + 1] + (num_of_chunks - extras) * [n_each_chunk])
    div_points = list(accumulate(chunk_sizes))
    sub_lists = []
    for i in range(num_of_chunks):
        start = div_points[i]
        end = div_points[i + 1]
        sub_lists.append(input_list[start:end])
    return (sub_list for sub_list in sub_lists)

result = list(list_split(data1, 3))
print(result)
# [[(1, 2), ('a', 'b')], [(3, 4), (5, 6)], [('c', 'd'), ('e', 'f')]]

result = list(list_split(data2, 3))
print(result)
# [[1, 2, 'a', 'b'], [3, 4, 5, 6], ['c', 'd', 'e', 'f']]

This solution is based on the zip " grouper " pattern from the Python 3 docs.此解决方案基于 Python 3 文档中的 zip“ grouper ”模式。 The small addition is that if N does not divide the list length evenly, all the extra items are placed into the first chunk.一个小的补充是,如果 N 没有均匀地划分列表长度,则所有额外的项目都被放置到第一个块中。

import itertools

def segment_list(l, N):
    chunk_size, remainder = divmod(len(l), N)
    first, rest = l[:chunk_size + remainder], l[chunk_size + remainder:]
    return itertools.chain([first], zip(*[iter(rest)] * chunk_size))

Example usage:用法示例:

>>> my_list = list(range(10))
>>> segment_list(my_list, 2)
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> segment_list(my_list, 3)
[[0, 1, 2, 3], (4, 5, 6), (7, 8, 9)]
>>>

The advantages of this solution are that it preserves the order of the original list, and is written in a functional style that lazily evaluates the list only once when called.这种解决方案的优点是它保留了原始列表的顺序,并以函数式风格编写,在调用时只对列表进行一次惰性求值。

Note that because it returns an iterator, the result can only be consumed once.请注意,因为它返回一个迭代器,所以结果只能被消费一次。 If you want the convenience of a non-lazy list, you can wrap the result in list :如果您想要非惰性列表的便利,可以将结果包装在list

>>> x = list(segment_list(my_list, 2))
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>>

I would simply do (let's say you want n chunks)我会简单地做(假设你想要 n 块)

import numpy as np

k = int(np.ceil(len(x) / n))
l = [x[i, i + k] for i in range(0, len(x), k)]
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
chunk = len(x)/6

l=[]
i=0
while i<len(x):
    if len(l)<=4:
        l.append(x [i:i + chunk])
    else:
        l.append(x [i:])
        break
    i+=chunk   

print l

#output=[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]
arr1=[-20, 20, -10, 0, 4, 8, 10, 6, 15, 9, 18, 35, 40, -30, -90, 99]
n=4
final = [arr1[i * n:(i + 1) * n] for i in range((len(arr1) + n - 1) // n )]
print(final)

Output:输出:

[[-20, 20, -10, 0], [4, 8, 10, 6], [15, 9, 18, 35], [40, -30, -90, 99]] [[-20, 20, -10, 0], [4, 8, 10, 6], [15, 9, 18, 35], [40, -30, -90, 99]]

This function will return the list of lists with the set maximum amount of values in one list (chunk).此函数将返回在一个列表(块)中设置了最大值的列表列表。

def chuncker(list_to_split, chunk_size):
    list_of_chunks =[]
    start_chunk = 0
    end_chunk = start_chunk+chunk_size
    while end_chunk <= len(list_to_split)+chunk_size:
        chunk_ls = list_to_split[start_chunk: end_chunk]
        list_of_chunks.append(chunk_ls)
        start_chunk = start_chunk +chunk_size
        end_chunk = end_chunk+chunk_size    
    return list_of_chunks

Example:例子:

ls = list(range(20))

chuncker(list_to_split = ls, chunk_size = 6)

output:输出:

[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19]] [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19]]

This accepts generators, without consuming it at once.这接受生成器,而不是立即消耗它。 If we know the size of the generator, the binsize can be calculated by max(1, size // n_chunks) .如果我们知道生成器的大小,则可以通过max(1, size // n_chunks)计算 binsize。

from time import sleep

def chunks(items, binsize):
    lst = []
    for item in items:
        lst.append(item)
        if len(lst) == binsize:
            yield lst
            lst = []
    if len(lst) > 0:
        yield lst


def g():
    for item in [1, 2, 3, 4, 5, 6, 7]:
        print("accessed:", item)
        sleep(1)
        yield item


for a in chunks(g(), 3):
    print("chunk:", list(a), "\n")

For people looking for an answer in python 3(.6) without imports.对于在没有导入的情况下在 python 3(.6) 中寻找答案的人。
x is the list to be split. x 是要拆分的列表。
n is the length of chunks. n 是块的长度。
L is the new list. L 是新列表。

n = 6
L = [x[i:i + int(n)] for i in range(0, (n - 1) * int(n), int(n))]

#[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM