简体   繁体   English

在键上加入python迭代器

[英]Join python iterators on key

I'm looking for a way to join python iterators like itertools.izip_longest() does, but I would like to join elements that have the same "key" (as defined by a parameter) and output None when the key does not exist on all iterators. 我正在寻找一种像itertools.izip_longest()一样加入python迭代器的方法,但是我想加入具有相同“键”(由参数定义)的元素,并且当键不存在时输出None所有迭代器。 I'm assuming iterators sorted ascending by "key". 我假设迭代器按“键”升序排列。

Example: 例:

iter1 = iter((1, 3, 4, 9))
iter2 = iter((3, 5, 6))
iter3 = iter((1, 3, 10))

zipjoiner(iter1, iter2, iter3)

should give: 应该给:

iter(((1, None, 1), (3, 3, 3), (4, None, None), (None, 5, None), (None, 6, None), (9, None, None), (None, None, 10)))

(in this case key is the default identity lambda x: x ) (在这种情况下,密钥是默认标识lambda x: x

I've tried to modify the izip_longest() implementation as found in python documentation and it works (at least on my example), but I'm looking for a more elegant solution. 我已经尝试修改python文档中izip_longest()实现,并且它可以正常工作(至少在我的示例中如此),但我正在寻找更优雅的解决方案。 Any idea? 任何想法?

This is my code: 这是我的代码:

def zipjoiner(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    key = kwds.get('key', lambda x: x)
    counter = [len(args) - 1]
    def sentinel():
        if not counter[0]:
            raise ZipExhausted
        counter[0] -= 1
        yield fillvalue
    fillers = itertools.repeat(fillvalue)
    iterators = [itertools.chain(it, sentinel(), fillers) for it in args]

    def getkey(x):
        return None if x is None else key(x)

    try:
        while iterators:
            elements = tuple(map(next, iterators))
            keys = tuple(map(getkey, elements))
            minkey = min(_ for _ in keys if not _ is None)
            while not all(k == minkey for k in keys):
                yield tuple(map(lambda (k, v): v if k == minkey else None, zip(keys, elements)))
                elements = tuple(map(lambda (k, it, v): it.next() if k == minkey else v, zip(keys, iterators, elements)))
                keys = tuple(map(getkey, elements))
                minkey = min(_ for _ in keys if not _ is None)
            yield elements

    except ZipExhausted:
        pass

In case you don't want to preserve order you could turn your lists into sets, then loop through the sorted list of all values from all input iterators and yield a tuple with values or None depending on whether the value is in the corresponding set: 如果您不想保留顺序,则可以将列表变成集合,然后遍历所有输入迭代器的所有值的排序列表,并根据值是否在对应的集合中生成具有值或None的元组:

def join_iterators(*iterators):
    sets = []
    for iterator in iterators:
        sets.append(set(iterator))

    values = set(itertools.chain(*iterators))
    get_value_or_none = lambda value, s: value if value in s else None
    for value in sorted(values):
        yield tuple(get_value_or_none(value, s) for s in sets)

This does not address your key function but I think you figure out how to apply that ;) 这不能解决您的关键功能,但我认为您可以弄清楚如何应用它;)

While you could make something that was more readable, I don't think you could make something that was more efficient. 虽然您可以制作出更具可读性的内容,但我认为您无法制作出更有效率的内容。

From the readability standpoint I would change the header and first few lines: 从可读性的角度来看,我将更改标题和前几行:

def zipjoiner(*iters, fillvalue=None, key=lambda x: x):
    # drop first two lines dealing with fillvalue and key

as there is no reason to deal with **kwds . 因为没有理由处理**kwds

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM