简体   繁体   English

在 Python 中减去两个列表

[英]Subtracting two lists in Python

In Python, How can one subtract two non-unique, unordered lists?在 Python 中,如何减去两个非唯一的无序列表? Say we have a = [0,1,2,1,0] and b = [0, 1, 1] I'd like to do something like c = a - b and have c be [2, 0] or [0, 2] order doesn't matter to me.假设我们有a = [0,1,2,1,0]b = [0, 1, 1]我想做类似c = a - b事情并且让c成为[2, 0][0, 2]顺序对我来说无关紧要。 This should throw an exception if a does not contain all elements in b.如果 a 不包含 b 中的所有元素,这应该抛出异常。

Note this is different from sets!请注意,这与集合不同! I'm not interested in finding the difference of the sets of elements in a and b, I'm interested in the difference between the actual collections of elements in a and b.我对找出 a 和 b 中元素集的差异不感兴趣,我对 a 和 b 中元素的实际集合之间的差异感兴趣。

I can do this with a for loop, looking up the first element of b in a and then removing the element from b and from a, etc. But this doesn't appeal to me, it would be very inefficient (order of O(n^2) time) while it should be no problem to do this in O(n log n) time.我可以用 for 循环来做到这一点,在 a 中查找 b 的第一个元素,然后从 b 和 a 中删除元素,等等。但这对我没有吸引力,它会非常低效( O(n^2)时间),而在O(n log n)时间内执行此操作应该没有问题。

I know "for" is not what you want, but it's simple and clear:我知道“for”不是你想要的,但它简单明了:

for x in b:
  a.remove(x)

Or if members of b might not be in a then use:或者如果b成员可能不在a则使用:

for x in b:
  if x in a:
    a.remove(x)

Python 2.7 and 3.2 added the collections.Counter class, which is a dictionary subclass that maps elements to the number of occurrences of the element. Python 2.7 和 3.2 添加了collections.Counter类,它是一个字典子类,将元素映射到元素的出现次数。 This can be used as a multiset.这可以用作多重集。 You can do something like this:你可以这样做:

from collections import Counter
a = Counter([0, 1, 2, 1, 0])
b = Counter([0, 1, 1])
c = a - b  # ignores items in b missing in a

print(list(c.elements()))  # -> [0, 2]

As well, if you want to check that every element in b is in a :同样,如果您想检查ba每个元素是否都在a

# a[key] returns 0 if key not in a, instead of raising an exception
assert all(a[key] >= b[key] for key in b)

But since you are stuck with 2.5, you could try importing it and define your own version if that fails.但是由于您坚持使用 2.5,如果失败,您可以尝试导入它并定义您自己的版本。 That way you will be sure to get the latest version if it is available, and fall back to a working version if not.这样,您将确保获得最新版本(如果可用),如果没有,则回退到工作版本。 You will also benefit from speed improvements if if gets converted to a C implementation in the future.如果将来转换为 C 实现,您还将受益于速度改进。

try:
   from collections import Counter
except ImportError:
    class Counter(dict):
       ...

You can find the current Python source here .您可以在此处找到当前的 Python 源代码。

I would do it in an easier way: 我会以更简单的方式做到这一点:

 
 
 
  
  a_b = [e for e in a if not e in b ]
 
 

..as wich wrote, this is wrong - it works only if the items are unique in the lists. ..正如所写,这是错误的 - 只有当项目在列表中是唯一的时才有效。 And if they are, it's better to use如果是,最好使用

a_b = list(set(a) - set(b))

I'm not sure what the objection to a for loop is: there is no multiset in Python so you can't use a builtin container to help you out.我不确定对 for 循环的反对意见是什么:Python 中没有多重集,因此您不能使用内置容器来帮助您。

Seems to me anything on one line (if possible) will probably be helishly complex to understand.在我看来,任何一行(如果可能)都可能很难理解。 Go for readability and KISS.追求可读性和亲吻。 Python is not C :) Python 不是 C :)

Python 2.7+ and 3.0 have collections.Counter (aka multiset). Python 2.7+ 和 3.0 有collections.Counter (又名多重集)。 The documentation links to Recipe 576611: Counter class for Python 2.5:文档链接到配方 576611: Python 2.5 的计数器类

from operator import itemgetter
from heapq import nlargest
from itertools import repeat, ifilter

class Counter(dict):
    '''Dict subclass for counting hashable objects.  Sometimes called a bag
    or multiset.  Elements are stored as dictionary keys and their counts
    are stored as dictionary values.

    >>> Counter('zyzygy')
    Counter({'y': 3, 'z': 2, 'g': 1})

    '''

    def __init__(self, iterable=None, **kwds):
        '''Create a new, empty Counter object.  And if given, count elements
        from an input iterable.  Or, initialize the count from another mapping
        of elements to their counts.

        >>> c = Counter()                           # a new, empty counter
        >>> c = Counter('gallahad')                 # a new counter from an iterable
        >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
        >>> c = Counter(a=4, b=2)                   # a new counter from keyword args

        '''        
        self.update(iterable, **kwds)

    def __missing__(self, key):
        return 0

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.

        >>> Counter('abracadabra').most_common(3)
        [('a', 5), ('r', 2), ('b', 2)]

        '''        
        if n is None:
            return sorted(self.iteritems(), key=itemgetter(1), reverse=True)
        return nlargest(n, self.iteritems(), key=itemgetter(1))

    def elements(self):
        '''Iterator over elements repeating each as many times as its count.

        >>> c = Counter('ABCABC')
        >>> sorted(c.elements())
        ['A', 'A', 'B', 'B', 'C', 'C']

        If an element's count has been set to zero or is a negative number,
        elements() will ignore it.

        '''
        for elem, count in self.iteritems():
            for _ in repeat(None, count):
                yield elem

    # Override dict methods where the meaning changes for Counter objects.

    @classmethod
    def fromkeys(cls, iterable, v=None):
        raise NotImplementedError(
            'Counter.fromkeys() is undefined.  Use Counter(iterable) instead.')

    def update(self, iterable=None, **kwds):
        '''Like dict.update() but add counts instead of replacing them.

        Source can be an iterable, a dictionary, or another Counter instance.

        >>> c = Counter('which')
        >>> c.update('witch')           # add elements from another iterable
        >>> d = Counter('watch')
        >>> c.update(d)                 # add elements from another counter
        >>> c['h']                      # four 'h' in which, witch, and watch
        4

        '''        
        if iterable is not None:
            if hasattr(iterable, 'iteritems'):
                if self:
                    self_get = self.get
                    for elem, count in iterable.iteritems():
                        self[elem] = self_get(elem, 0) + count
                else:
                    dict.update(self, iterable) # fast path when counter is empty
            else:
                self_get = self.get
                for elem in iterable:
                    self[elem] = self_get(elem, 0) + 1
        if kwds:
            self.update(kwds)

    def copy(self):
        'Like dict.copy() but returns a Counter instance instead of a dict.'
        return Counter(self)

    def __delitem__(self, elem):
        'Like dict.__delitem__() but does not raise KeyError for missing values.'
        if elem in self:
            dict.__delitem__(self, elem)

    def __repr__(self):
        if not self:
            return '%s()' % self.__class__.__name__
        items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
        return '%s({%s})' % (self.__class__.__name__, items)

    # Multiset-style mathematical operations discussed in:
    #       Knuth TAOCP Volume II section 4.6.3 exercise 19
    #       and at http://en.wikipedia.org/wiki/Multiset
    #
    # Outputs guaranteed to only include positive counts.
    #
    # To strip negative and zero counts, add-in an empty counter:
    #       c += Counter()

    def __add__(self, other):
        '''Add counts from two counters.

        >>> Counter('abbb') + Counter('bcc')
        Counter({'b': 4, 'c': 2, 'a': 1})


        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] + other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __sub__(self, other):
        ''' Subtract count, but keep only results with positive counts.

        >>> Counter('abbbc') - Counter('bccd')
        Counter({'b': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] - other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __or__(self, other):
        '''Union is the maximum of value in either of the input counters.

        >>> Counter('abbb') | Counter('bcc')
        Counter({'b': 3, 'c': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _max = max
        result = Counter()
        for elem in set(self) | set(other):
            newcount = _max(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result

    def __and__(self, other):
        ''' Intersection is the minimum of corresponding counts.

        >>> Counter('abbb') & Counter('bcc')
        Counter({'b': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _min = min
        result = Counter()
        if len(self) < len(other):
            self, other = other, self
        for elem in ifilter(self.__contains__, other):
            newcount = _min(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

Then you can write然后你可以写

 a = Counter([0,1,2,1,0])
 b = Counter([0, 1, 1])
 c = a - b
 print list(c.elements())  # [0, 2]

to use list comprehension:使用列表理解:

[i for i in a if not i in b or b.remove(i)]

would do the trick.会做的伎俩。 It would change b in the process though.虽然它会在这个过程中改变 b 。 But I agree with jkp and Dyno Fu that using a for loop would be better.但我同意 jkp 和 Dyno Fu 的观点,即使用 for 循环会更好。

Perhaps someone can create a better example that uses list comprehension but still is KISS?也许有人可以创建一个使用列表理解的更好的例子,但仍然是 KISS?

To prove jkp's point that 'anything on one line will probably be helishly complex to understand', I created a one-liner.为了证明 jkp 的观点,即“一行中的任何内容都可能非常难以理解”,我创建了一个单行。 Please do not mod me down because I understand this is not a solution that you should actually use.请不要让我失望,因为我知道这不是您应该实际使用的解决方案。 It is just for demonstrational purposes.它仅用于演示目的。

The idea is to add the values in a one by one, as long as the total times you have added that value does is smaller than the total number of times this value is in a minus the number of times it is in b:这个想法是将 a 中的值一一相加,只要您添加该值的总次数小于该值在 a 中的总次数减去它在 b 中的次数:

[ value for counter,value in enumerate(a) if a.count(value) >= b.count(value) + a[counter:].count(value) ]

The horror!恐怖! But perhaps someone can improve on it?但也许有人可以改进它? Is it even bug free?它甚至没有错误吗?

Edit: Seeing Devin Jeanpierre comment about using a dictionary datastructure, I came up with this oneliner:编辑:看到 Devin Jeanpierre 关于使用字典数据结构的评论,我想出了这个 oneliner:

sum([ [value]*count for value,count in {value:a.count(value)-b.count(value) for value in set(a)}.items() ], [])

Better, but still unreadable.更好,但仍然无法阅读。

You can try something like this:你可以尝试这样的事情:

class mylist(list):

    def __sub__(self, b):
        result = self[:]
        b = b[:]
        while b:
            try:
                result.remove(b.pop())
            except ValueError:
                raise Exception("Not all elements found during subtraction")
        return result


a = mylist([0, 1, 2, 1, 0] )
b = mylist([0, 1, 1])

>>> a - b
[2, 0]

You have to define what [1, 2, 3] - [5, 6] should output though, I guess you want [1, 2, 3] thats why I ignore the ValueError.你必须定义 [1, 2, 3] - [5, 6] 应该输出什么,我想你想要 [1, 2, 3] 这就是我忽略 ValueError 的原因。

Edit: Now I see you wanted an exception if a does not contain all elements, added it instead of passing the ValueError.编辑:现在我看到你想要一个异常,如果a不包含所有元素,添加它而不是传递 ValueError。

I attempted to find a more elegant solution, but the best I could do was basically the same thing that Dyno Fu said:我试图找到一个更优雅的解决方案,但我能做的最好的事情基本上和 Dyno Fu 说的一样:

from copy import copy

def subtract_lists(a, b):
    """
    >>> a = [0, 1, 2, 1, 0]
    >>> b = [0, 1, 1]
    >>> subtract_lists(a, b)
    [2, 0]

    >>> import random
    >>> size = 10000
    >>> a = [random.randrange(100) for _ in range(size)]
    >>> b = [random.randrange(100) for _ in range(size)]
    >>> c = subtract_lists(a, b)
    >>> assert all((x in a) for x in c)
    """
    a = copy(a)
    for x in b:
        if x in a:
            a.remove(x)
    return a

Here's a relatively long but efficient and readable solution.这是一个相对较长但有效且可读的解决方案。 It's O(n).是 O(n)。

def list_diff(list1, list2):
    counts = {}
    for x in list1:
        try:
            counts[x] += 1
        except:
            counts[x] = 1
    for x in list2:
        try:
            counts[x] -= 1
            if counts[x] < 0:
                raise ValueError('All elements of list2 not in list2')
        except:
            raise ValueError('All elements of list2 not in list1') 
    result = []
    for k, v in counts.iteritems():
        result += v*[k] 
    return result

a = [0, 1, 1, 2, 0]
b = [0, 1, 1]
%timeit list_diff(a, b)
%timeit list_diff(1000*a, 1000*b)
%timeit list_diff(1000000*a, 1000000*b)
100000 loops, best of 3: 4.8 µs per loop
1000 loops, best of 3: 1.18 ms per loop
1 loops, best of 3: 1.21 s per loop

You can use the map construct to do this.您可以使用map构造来执行此操作。 It looks quite ok, but beware that the map line itself will return a list of None s.它看起来很不错,但要注意map线本身将返回一个None列表。

a = [1, 2, 3]
b = [2, 3]

map(lambda x:a.remove(x), b)
a
c = [i for i in b if i not in a]
list(set([x for x in a if x not in b]))
  • Leaves a and b untouched.保持ab不变。
  • Is a unique set of "a - b".是一组唯一的“a - b”。
  • Done.完毕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM