简体   繁体   English

我认为这应该引发错误,但事实并非如此

[英]I think this should raise an error, but it doesn't

Below is a simple function to remove duplicates in a list while preserving order. 下面是一个简单的功能,可以在保留顺序的同时删除列表中的重复项。 I've tried it and it actually works, so the problem here is my understanding. 我已经尝试过了,它确实有效,所以这里的问题是我的理解。 It seems to me that the second time you run uniq.remove(item) for a given item, it will return an error ( KeyError or ValueError I think?) because that item has already been removed from the unique set. 在我看来,第二次为给定项运行uniq.remove(item) ,它将返回一个错误( KeyErrorValueError我认为?),因为该项已经从唯一集中删除。 Is this not the case? 这不是这种情况吗?

def unique(seq):
    uniq = set(seq)  
    return [item for item in seq if item in uniq and not uniq.remove(item)]

There's a check if item in uniq which gets executed before the item is removed. 检查if item in uniq中的项目是否在删除项目之前执行。 The and operator is nice in that it "short circuits". and操作员很好,因为它“短路”。 This means that if the condition on the left evaluates to False -like, then the condition on the right doesn't get evaluated -- We already know the expression can't be True -like. 这意味着如果左边的条件评估为False like,则右边的条件不会被评估 - 我们已经知道表达式不能像True

set.remove is an in-place operation. set.remove是一个就地操作。 This means that it does not return anything (well, it returns None ); 这意味着它不返回任何东西(好吧,它返回None ); and bool(None) is False . bool(None)False

So your list comprehension is effectively this: 所以你的列表理解实际上是这样的:

answer = []
for item in seq:
    if item in uniq and not uniq.remove(item):
        answer.append(item)

and since python does short circuiting of conditionals (as others have pointed out), this is effectively: 并且由于python会使条件短路(正如其他人指出的那样),这实际上是:

answer = []
for item in seq:
    if item in uniq:
        if not uniq.remove(item):
            answer.append(item)

Of course, since unique.remove(item) returns None (the bool of which is False ), either both conditions are evaluated or neither. 当然,由于unique.remove(item)返回NoneboolFalse ),要么同时评估这两个条件,要么都不评估。

The reason that the second condition exists is to remove item from uniq . 存在第二个条件的原因是从uniq删除item This way, if/when you encounter item again (as a duplicate in seq ), it will not be found in uniq because it was deleted from uniq the last time it was found there. 这样,如果/当你再次遇到item时(作为seq的重复项),它将无法在uniq找到,因为它是在上次找到它时从uniq中删除的。

Now, keep in mind, that this is fairly dangerous as conditions that modify variables are considered bad style (imagine debugging such a conditional when you aren't fully familiar with what it does). 现在,请记住,这是相当危险的,因为修改变量的条件被认为是不好的样式(想象一下当你不完全熟悉它的作用时调试这样的条件)。 Conditionals should really not modify the variables they check. 条件语不应该修改它们检查的变量。 As such, they should only read the variables, not write to them as well. 因此,他们应该只读取变量,而不是写入变量。

Hope this helps 希望这可以帮助

mgilson and others has answered this question nicely, as usual. mgilson和其他人像往常一样很好地回答了这个问题。 I thought I might point out what is probably the canonical way of doing this in python, namely using the unique_everseen recipe from the recipe section of the itertools docs, quoted below: 我想我可能会指出在python中执行此操作的规范方法,即使用itertools docs的recipe部分中的unique_everseen配方,引用如下:

from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element
def unique_with_order(seq):
    final = []
    for item in seq:
        if item not in final:
            final.append(item)
    return final


print unique_with_order([1,2,3,3,4,3,6])

Break it down, make it simple :) Not everything has to be a list comprehension these days. 分解,简化:)现在不是所有东西都必须是列表理解。

@mgilson's answer is the right one, but here, for your information, is a possible lazy ( generator ) version of the same function. @ mgilson的答案是正确的,但在这里,对于您的信息,是一个可能的懒惰( 生成器 )版本的相同功能。 This means it'll work for iterables that don't fit in memory - including infinite iterators - as long as the set of its elements will. 这意味着它将适用于不适合内存的迭代 - 包括无限迭代器 - 只要它的元素集合就可以。

def unique(iterable):
    uniq = set()
    for item in iterable:
        if item not in uniq:
            uniq.add(item)
            yield item

The first time you run this function, you will get [1,2,3,4] from your list comprehension and the set uniq will be emptied. 第一次运行此函数时,您将从列表uniq获得[1,2,3,4] ,并且将清空set uniq The second time you run this function, you will get [] because your set uniq will be empty. 第二次运行此函数时,您将获得[]因为您的set uniq将为空。 The reason you don't get any errors on the second run is that Python's and short circuits - it sees the first clause ( item in uniq ) is false and doesn't bother to run the second clause. 你在第二次运行中没有得到任何错误的原因是Python and短路 - 它看到第一个子句( item in uniq )是假的,并且不打扰运行第二个子句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM