繁体   English   中英

Python:对称差异排序列表

[英]Python: Symmetric Difference Sorted List

是否有一个好方法来获取python中两个排序列表的对称差并返回一个排序列表作为结果。 我当前的版本似乎工作不佳(转换为集合,找到对称差,转换回列表,然后求助)

使用Numpy的解决方案很好,正在排序的数据类型为int。

sorted_symdiff(list1,list2):
""" Each list is already sorted, this seems inefficient """
    s1,s2 = set(list1),set(list2)
    diff = list(s1.symmetric_difference(s2))
    diff.sort()
    return diff

是的,有办法。 您必须利用两个序列已排序的事实。 您需要遍历两个元素,同时逐个比较元素,并在沿每个序列进行操作时构造对称差异。

如果您熟悉大O表示法 ,则以下代码的复杂度为O(m+n) ,其中m = len(seq1)n = len(seq2)

算法的复杂度为O(log(m+n)*(m+n))因为您需要对结果集进行排序。

警告:

这个答案主要是用来演示如何利用排序输入的练习。

尽管复杂度更高,但对于大多数输入而言,其执行时间比使用python内置set方法的原始海报代码慢。 在python中,集是在引擎盖下用C代码实现的。 纯Python很难击败它。 要想看到任何优势(如果有任何可见的话),需要非常大的投入。 这个算法是最有效的,但这并不意味着它会更快-也不意味着您应该使用它。 它们使代码更易于编写,阅读,理解,调试和维护。

码:

def get_symmetric_difference(seq1, seq2):
    """
    computes the symmetric difference of unique elements of seq1 & seq2 
    as a new sorted list, without mutating the parameters.

    seq1: a sorted sequence of int
    seq2: a sorted sequence of int

    return: a new sorted list containing the symmetric difference 
            of unique elements of seq1 & seq2
    """

    if not seq1:
        symmetric_difference = seq2[:]
        return symmetric_difference
    if not seq2:
        symmetric_difference = seq1[:]
        return symmetric_difference

    symmetric_difference = []

    idx = 0
    jdx = 0  
    last_insert = None
    last_seen = None

    while idx < len(seq1) and jdx < len(seq2):
        s1 = seq1[idx]
        s2 = seq2[jdx]
        if s1 == s2:
            idx += 1
            jdx += 1
            last_seen = s1
        elif s1 < s2:
            if last_insert != s1 and last_seen != s1:
                symmetric_difference.append(s1)
                last_insert = s1
            idx += 1
        elif s2 < s1:
            if last_insert != s2 and last_seen != s2:
                symmetric_difference.append(s2)
                last_insert = s2
            jdx += 1

    if len(seq1[idx:]) > len(seq2[jdx:]):
        for elt in seq1[idx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt
    else:
        for elt in seq2[jdx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt

    return symmetric_difference

测试:

def test_get_symmetric_difference():

    seq1 = []
    seq2 = []
    assert get_symmetric_difference(seq1, seq2) == []

    seq1 = [1]
    seq2 = []
    assert get_symmetric_difference(seq1, seq2) == [1]

    seq1 = [1, 2, 3, 4]
    seq2 = [-2, -1, 5, 6, 7, 8]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 5, 6, 7, 8]

    seq1 = [    -1, 1, 2, 3, 4,    6,       9,  22, 34]
    seq2 = [-2, -1,             5, 6, 7, 8, 19, 22,    43]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]

    seq1 = [-2, -1,             5, 6, 7, 8, 19, 22,    43]
    seq2 = [    -1, 1, 2, 3, 4,    6,       9,  22, 34]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]

    seq1 = [-2, -1, 0,            5,       22, 34]
    seq2 = [-2, -1,   1, 2, 3, 4,    6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == [0, 1, 2, 3, 4, 5, 6, 9]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == []

    seq1 = [7, 7, 7, 7, 7, 7]
    seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [7, 7, 7, 7, 7, 7]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [-1, -1, 7, 7, 43, 43, 43]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 6, 7, 9, 22, 34, 43]

    seq1 = [34, 34, 34, 34]
    seq2 = [7, 34]
    assert get_symmetric_difference(seq1, seq2) == [7]

    seq1 = [7, 34]
    seq2 = [34, 34, 34, 34]
    assert get_symmetric_difference(seq1, seq2) == [7]

    seq1 = [7, 34]
    seq2 = [7, 7, 7, 7, 7]
    assert get_symmetric_difference(seq1, seq2) == [34]

    seq1 = [7, 7, 7, 7, 34]
    seq2 = [7, 7]
    assert get_symmetric_difference(seq1, seq2) == [34]

    print("***all tests pass***")


test_get_symmetric_difference()

输出:

***all tests pass***

永远不要相信要排序的set 始终排序后您的转换setlist对象时,你希望返回一个排序list 我不确定在以下说明中观察到的行为。

转换回列表后,无需排序,因为列表已经排序。 删除多余的排序将使其更有效率。

如果保证list1list2是正int对象的排序列表,则在Python 3.5中,似乎返回的symmetric_difference set返回排序后的结果。 如果list1list2包含任何负intfloat则结果将需要再次排序。

def sorted_symdiff(list1,list2):
    """ Each list is already sorted, this seems inefficient """
    s1,s2 = set(list1),set(list2)
    diff = list(s1.symmetric_difference(s2))
    return diff

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM