简体   繁体   English

在 python 中将子列表分组到阈值长度下的函数的优雅和优化解决方案

[英]Elegant and optimized solution to a function that group sublists under a threshold length in python

Given a list L , for instance, [[1,1,1,1], [1,1,1], [1,1], [1]] and a max_len=8 I would like to create a new list LN like this [[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]] .给定一个列表L ,例如, [[1,1,1,1], [1,1,1], [1,1], [1]]和一个max_len=8我想创建一个新列表LN像这样[[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]]

So I have a list of lists.所以我有一个列表列表。 I want to group the lists in a way that the sum of the lengths of each list is <= max_len.我想以每个列表的长度总和 <= max_len 的方式对列表进行分组。 You need to retain the lists as they are, same order, so only consecutive lists can be grouped.您需要按原样保留列表,顺序相同,因此只能对连续的列表进行分组。

I've been trying to do it in the most Pythonic and efficient way.我一直在尝试以最 Pythonic 和最有效的方式来做到这一点。 Should be O(n).应该是 O(n)。 With the help of someone, this is the code I have so far:在某人的帮助下,这是我迄今为止的代码:

def chunks(list_to_chunck, max_len):
    if any(len(sub_list) > max_len for sub_list in list_to_chunck):
        return None

    new_list = []
    while list_to_chunck:
        copy_list = [list_to_chunck.pop(0)]
        while list_to_chunck:
            if len(list_to_chunck[0]) + sum(len(sub_list) for sub_list in copy_list) <= max_len:
                copy_list.append(list_to_chunck.pop(0))
            else:
                break
        new_list.append(copy_list)

    return new_list

You can use a variable size to keep track of the current size of the last sub-list in the output list, and whenever it's going to exceed max_len after adding the current sub-list in the iteration, append a new sub-list to the output.您可以使用可变size来跟踪输出列表中最后一个子列表的当前大小,并且每当在迭代中添加当前子列表后它将超过max_len ,将一个新的子列表附加到输出。 Initialize size with a value greater than max_len so that it will always add a new sub-list in the first iteration.用大于max_len的值初始化size以便它总是在第一次迭代中添加一个新的子列表。 With this approach the time complexity will be O(n) :使用这种方法,时间复杂度将为O(n)

def chunks(lst, max_len):
    output = []
    size = max_len + 1
    for s in lst:
        size += len(s)
        if size > max_len:
            output.append([])
            size = len(s)
            if size > max_len:
                return
        output[-1].append(s)
    return output

so that chunks([[1, 1, 1, 1], [1, 1, 1], [1, 1], [1]], 8) returns:这样chunks([[1, 1, 1, 1], [1, 1, 1], [1, 1], [1]], 8)返回:

[[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]]

Here is a sketch of an approach that will be O(N).这是一个 O(N) 方法的草图。 It creates a new list, and doesn't modify the original.它会创建一个新列表,并且不会修改原始列表。 It doesn't handle all edge cases, but this should get you going:它不能处理所有边缘情况,但这应该可以帮助您:

In [1]: data = [[1,1,1,1], [1,1,1], [1,1], [1]]
   ...:

In [2]: def chunks(nested, maxlen):
   ...:     total = 0
   ...:     result = []
   ...:     piece = []
   ...:     for sub in nested:
   ...:         length = len(sub)
   ...:         if total + length > maxlen:
   ...:             result.append(piece)
   ...:             piece = [sub]
   ...:             total = length
   ...:         else:
   ...:             piece.append(sub)
   ...:             total += length
   ...:     if piece:
   ...:         result.append(piece)
   ...:     return result
   ...:

In [3]: chunks(data, 8)
Out[3]: [[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]]

You don't seem to have any constraints on total number of lists, so a greedy approach should be fine:您似乎对列表总数没有任何限制,因此贪婪的方法应该没问题:

def chunks(items, max_len):
    ret = [[]]
    remaining = max_len
    for i in items:
        if len(i) > remaining:
            ret.append([])
            remaining = max_len
            if len(i) > remaining:
                return None  # Could raise on impossible
        ret[-1].append(i)
        remaining -= len(i)
    return ret

using your example:使用您的示例:

items = [[1,1,1,1], [1,1,1], [1,1], [1]]
assert chunks(items, 8) == [[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]]

Seeing other answers, this is pretty much in line, so I wanted to toss in a less readable option, without the length guarantee =)看到其他答案,这几乎是一致的,所以我想加入一个不太可读的选项,没有长度保证=)

def chunks(items, max_len):
    count = [0, 0]
    def group(item): 
        count[1] += len(item)
        if count[1] >= 8:
            count[0] += 1
            count[1] = 0
        return count[0]
    return [list(v) for k, v in itertools.groupby(data, key=group)]

A generator-based solution:基于生成器的解决方案:

def group_subseqs(seq, max_len):
    curr_size = 0
    result = []
    for subseq in seq:
        len_subseq = len(subseq)
        if curr_size + len_subseq <= max_len:
            result.append(subseq)
            curr_size += len_subseq
        else:
            if result:
                yield result
                if len_subseq <= max_len:
                    result = [subseq]
                    curr_size = len_subseq
                else:
                    return
            else:
                return
    if result:
        yield result

working like expected (sort of... stops yielding if a sublist is larger than max_len instead of not yielding anything at all):像预期的那样工作(有点......如果子列表大于max_len则停止产生而不是根本不产生任何东西):

a = [[1,1,1,1], [1,1,1], [1,1], [1]]
print(list(group_subseqs(a, 8)))
# [[[1, 1, 1, 1], [1, 1, 1]], [[1, 1], [1]]]

print(list(group_subseqs(a, 4)))
# [[[1, 1, 1, 1]], [[1, 1, 1]], [[1, 1], [1]]]

print(list(group_subseqs(a, 3)))
# []

print(list(group_subseqs(a[::-1], 3)))
# [[[1], [1, 1]], [[1, 1, 1]]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM