简体   繁体   English

Itertools zip_longest 将每个子列表的第一项作为填充值而不是默认情况下的 None

[英]Itertools zip_longest with first item of each sub-list as padding values in stead of None by default

I have this list of lists :我有这个列表列表:

cont_det = [['TASU 117000 0', "TGHU 759933 - 0", 'CSQU3054383', 'BMOU 126 780-0', "HALU 2014 13 3"], ['40HS'], ['Ha2ardous Materials', 'Arm5 Maehinery']]

Practically cont_det is a huge list with lots of sub-lists with irregular length of each sub-list.实际上cont_det是一个巨大的列表,其中包含许多子列表,每个子列表的长度不规则。 This is just a sample case for demonstration.这只是用于演示的示例案例。 I want to get the following output :我想获得以下输出:

[['TASU 117000 0', '40HS', 'Ha2ardous Materials'], 
 ['TGHU 759933 - 0', '40HS', 'Arm5 Maehinery'], 
 ['CSQU3054383', '40HS', 'Ha2ardous Materials'], 
 ['BMOU 126 780-0', '40HS', 'Ha2ardous Materials'], 
 ['HALU 2014 13 3', '40HS', 'Ha2ardous Materials']]

The logic behind this is zip_longest the list of lists but in case there is any sub-list whose length is less than the maximum of all lengths of the sub-lists (which is 5 here for first sub-list), then in stead of default fillvalue=None take the first item of that sub-list - as seen in case of second sub-list all reflected filled values are same and for the third one, the last three are filled by the first value.这背后的逻辑是zip_longest列表列表,但如果有任何子列表的长度小于子列表的所有长度的最大值(第一个子列表在这里为 5),则代替默认fillvalue=None取该子列表的第一项 - 如在第二个子列表的情况下所见,所有反映的填充值都相同,对于第三个,最后三个由第一个值填充。

I have got the result with this code :我用这段代码得到了结果:

from itertools import zip_longest as zilo
from more_itertools import padded as pad
max_ = len(max(cont_det, key=len))
for i, cont_row in enumerate(cont_det):
    if len(cont_det)!=max_:
        cont_det[i] = list(pad(cont_row, cont_row[0], max_))
cont_det = list(map(list, list(zilo(*cont_det))))

This gives me the expected result.这给了我预期的结果。 In stead had I done list(zilo(*cont_det, fillvalue='')) I would have gotten this : list(zilo(*cont_det, fillvalue=''))我完成了list(zilo(*cont_det, fillvalue=''))我会得到这个:

[('TASU 117000 0', '40HS', 'Ha2ardous Materials'), 
 ('TGHU 759933 - 0', '', 'Arm5 Maehinery'), 
 ('CSQU3054383', '', ''), 
 ('BMOU 126 780-0', '', ''), 
 ('HALU 2014 13 3', '', '')]

Is there any other process (like mapping any function or so) to the parameter fillvalue of the zip_longest function so that I don't have to iterate through the list to pad each sub-list up to the length of the longest sub-list before that and this thing can be done in a line with only zip_longest ?是否有任何其他过程(例如将任何函数等映射到zip_longest函数的参数fillvalue ,以便我不必遍历列表来填充每个子列表直到最长子列表的长度之前那和这件事可以只用zip_longest来完成?

You can peek into each of the iterators via next in order to extract the first item ("head"), then create a sentinel object that marks the end of the iterator and finally chain everything back together in the following way: head -> remainder_of_iterator -> sentinel -> it.repeat(head) .你可以窥视到每个通过迭代器的next ,以便提取的第一项(“头”),然后创建一个sentinel对象标记迭代结束,最后chain一切重新走到一起的方式如下: head -> remainder_of_iterator -> sentinel -> it.repeat(head)

This uses it.repeat to replay the first item ad infinitum once the end of the iterator has been reached, so we need to introduce a way to stop that process once the last iterator hits its sentinel object.一旦到达迭代器的末尾,这将使用it.repeat无限重播第一个项目,因此我们需要引入一种方法来在最后一个迭代器命中其sentinel对象时停止该过程。 For this we can (ab)use the fact that map stops iterating if the mapped function raises (or leaks) a StopIteration , such as from next invoked on an already exhausted iterator.为此,我们可以 (ab) 使用map停止迭代的事实,如果映射函数引发(或泄漏) StopIteration ,例如从next在已经耗尽的迭代器上调用。 Alternatively we can use the 2-argument form of iter to stop on a sentinel object (see below).或者,我们可以使用iter的 2-argument 形式来停止sentinel对象(见下文)。

So we can map the chained iterators over a function that checks for each item whether it is sentinel and performs the following steps:因此,我们可以将链式迭代器映射到一个函数上,该函数检查每个项目是否is sentinel并执行以下步骤:

  1. if item is sentinel then consume a dedicated iterator that yields one item fewer than the total number of iterators via next (hence leaking StopIteration for the last sentinel) and replace the sentinel with the corresponding head . if item is sentinel则使用一个专用迭代器,该迭代器通过next产生的 item 少于迭代器的总数(因此泄漏了最后一个 sentinel 的StopIteration ),并用相应的head替换了sentinel
  2. else just return the original item. else只需返回原始项目。

Finally we can just zip the iterators together - it will stop on the last one hitting its sentinel object, ie performing a "zip-longest".最后,我们可以将迭代器zip在一起 - 它会在最后一个击中其sentinel对象时停止,即执行“zip-longest”。

In summary, the following function performs the steps described above:总之,以下函数执行上述步骤:

import itertools as it


def solution(*iterables):
    iterators = [iter(i) for i in iterables]  # make sure we're operating on iterators
    heads = [next(i) for i in iterators]  # requires each of the iterables to be non-empty
    sentinel = object()
    iterators = [it.chain((head,), iterator, (sentinel,), it.repeat(head))
                 for iterator, head in zip(iterators, heads)]
    # Create a dedicated iterator object that will be consumed each time a 'sentinel' object is found.
    # For the sentinel corresponding to the last iterator in 'iterators' this will leak a StopIteration.
    running = it.repeat(None, len(iterators) - 1)
    iterators = [map(lambda x, h: next(running) or h if x is sentinel else x,  # StopIteration causes the map to stop iterating
                     iterator, it.repeat(head))
                 for iterator, head in zip(iterators, heads)]
    return zip(*iterators)

If leaking StopIteration from the mapped function in order to terminate the map iterator feels too awkward then we can slightly modify the definition of running to yield an additional sentinel and use the 2-argument form of iter in order to stop on sentinel :如果从映射函数中泄漏StopIteration以终止map迭代器感觉太尴尬,那么我们可以稍微修改running的定义以产生额外的sentinel并使用iter的 2-argument 形式来停止sentinel

running = it.chain(it.repeat(None, len(iterators) - 1), (sentinel,))
iterators = [...]  # here the conversion to map objects remains unchanged
return zip(*[iter(i.__next__, sentinel) for i in iterators])

If the name resolution for sentinel and running from inside the mapped function is a concern, they can be included as arguments to that function:如果sentinel和从映射函数内部running的名称解析是一个问题,它们可以作为该函数的参数包含在内:

iterators = [map(lambda x, h, s, r: next(r) or h if x is s else x,
                 iterator, it.repeat(head), it.repeat(sentinel), it.repeat(running))
             for iterator, head in zip(iterators, heads)]

That looks like some sort of "matrix rotation".这看起来像是某种“矩阵旋转”。

I've done it without any libs used to make it clear for everybody.我已经做到了,没有任何用来让每个人都清楚的库。 That's pretty easy as for me.对我来说这很容易。

from pprint import pprint

cont_det = [
    ['TASU 117000 0', "TGHU 759933 - 0", 'CSQU3054383', 'BMOU 126 780-0', "HALU 2014 13 3"],
    ['40HS'],
    ['Ha2ardous Materials', 'Arm5 Maehinery'],
]


def rotate_matrix(source):
    result = []

    # let's find the longest sub-list length
    length = max((len(row) for row in source))

    # for every column in sub-lists create a new row in the resulting list
    for column_id in range(0, length):
        result.append([])

        # let's fill the new created row using source row columns data.
        for row_id in range(0, len(source)):
            # let's use the first value from the sublist values if source row list has it for the column_id
            if len(source[row_id]) > column_id:
                result[column_id].append(source[row_id][column_id])
            else:
                try:
                    result[column_id].append(source[row_id][0])
                except IndexError:
                    result[column_id].append(None)

    return result


pprint(rotate_matrix(cont_det))

And, of course, the script output当然,还有脚本输出


> python test123.py
[['TASU 117000 0', '40HS', 'Ha2ardous Materials'],
 ['TGHU 759933 - 0', '40HS', 'Arm5 Maehinery'],
 ['CSQU3054383', '40HS', 'Ha2ardous Materials'],
 ['BMOU 126 780-0', '40HS', 'Ha2ardous Materials'],
 ['HALU 2014 13 3', '40HS', 'Ha2ardous Materials']]

Can't understand about zip_longest function.无法理解zip_longest函数。 Is it a requirement for the solution or you need a solution "which just works" :) Because it doesn't look like zip_longest supports any sort of callbacks or etc where we can return required value "per cell" in the matrix.这是解决方案的要求,还是您需要一个“可以正常工作”的解决方案:) 因为它看起来不像zip_longest支持任何类型的回调等,我们可以在矩阵中“每个单元格”返回所需的值。

If you want to do this in a general way for arbitrary iterators, you can use a sentinel value as the default, and replace it with the first value for that column.如果您想对任意迭代器以一般方式执行此操作,您可以使用标记值作为默认值,并将其替换为该列的第一个值。 This has the advantage that it works without requiring you to expand anything up front or know the lengths.这样做的优点是它不需要您预先扩展任何东西或知道长度。

def zip_longest_special(*iterables):
    def filter(items, defaults):
        return tuple(d if i is sentinel else i for i, d in zip(items, defaults))
    sentinel = object()
    iterables = zip_longest(*iterables, fillvalue=sentinel)
    first = next(iterables)
    yield filter(first, [None] * len(first))
    for item in iterables:
        yield filter(item, first)

The answer is no.答案是不。 There is only one meaning for fillvalue argument. fillvalue参数只有一种含义。 In any case there was another answer here, quite nice, but suddenly it was deleted.无论如何,这里还有另一个答案,很好,但突然被删除了。 The code below is pretty close to that code but it works with itertools instead of list methods.下面的代码与该代码非常接近,但它适用于itertools而不是 list 方法。

from itertools import chain, repeat
def zilo(data):
    try:
        i1 = next(it := iter(data))
    except StopIteration:
        return zip()
    return zip(chain(i1, repeat(i1[0], len(max(data, key=len))-len(i1))),
               *(chain(i, repeat(i[0])) for i in it))

Adding another variation添加另一个变体

def zipzag(fill, *cols):
   
   sizes = [len(col) for col in cols] # size of individual list in nested list
   
   longest = max(*sizes) 
   
   return [[xs[i] if i < sizes[j] else fill(xs) for j, xs in enumerate(cols)]for i in range(longest)] 

cont_det = [['TASU 117000 0', "TGHU 759933 - 0", 'CSQU3054383', 'BMOU 126 780-0', "HALU 2014 13 3"], ['40HS'], ['Ha2ardous Materials', 'Arm5 Maehinery']] 
                           

print(zipzag(lambda xs: xs[0], *cont_det))                    

produces,产生,

[['TASU 117000 0', '40HS', 'Ha2ardous Materials'], ['TGHU 759933 - 0', '40HS', 'Arm5 Maehinery'], ['CSQU3054383', '40HS', 'Ha2ardous Materials'], ['BMOU 126 780-0', '40HS', 'Ha2ardous Materials'], ['HALU 2014 13 3', '40HS', 'Ha2ardous Materials']]

[Program finished]

fill is a function that receives a list and should return something to make the lengths of the lists match up and make the zip work. fill 是一个接收列表的函数,它应该返回一些东西以使列表的长度匹配并使 zip 工作。 The example i gave returns the first element of the column我给出的例子返回列的第一个元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM