简体   繁体   English

查找列表中的不常见元素

[英]Find non-common elements in lists

I'm trying to write a piece of code that can automatically factor an expression.我正在尝试编写一段可以自动分解表达式的代码。 For example, if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].例如,如果我有两个列表 [1,2,3,4] 和 [2,3,5],代码应该能够找到两个列表 [2,3] 中的共同元素,并组合其余元素一起在一个新列表中,即 [1,4,5]。

From this post: How to find list intersection?来自这篇文章: How to find list intersection? I see that the common elements can be found by我看到可以通过以下方式找到共同元素

set([1,2,3,4]&set([2,3,5]). 

Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?有没有一种简单的方法可以从每个列表中检索非公共元素,在我的示例中是 [1,4] 和 [5]?

I can go ahead and do a for loop:我可以继续做一个 for 循环:

lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
    for elem in eachList:
    if elem not in common:
        nonCommon += elem

But this seems redundant and inefficient.但这似乎是多余且低效的。 Does Python provide any handy function that can do that? Python 是否提供任何方便的函数来做到这一点? Thanks in advance!!提前致谢!!

Use the symmetric difference operator for set s (aka the XOR operator):set s 使用对称差分运算符(也称为 XOR 运算符):

>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])

Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference() .老问题,但看起来 python 有一个内置函数可以准确提供您正在寻找的内容: .difference()

EXAMPLE例子

list_one = [1,2,3,4]
list_two = [2,3,5]

one_not_two = set(list_one).difference(list_two)
# set([1, 4])

two_not_one = set(list_two).difference(list_one)
# set([5])

This could also be written as:这也可以写成:

one_not_two = set(list_one) - set(list_two)

Timing定时

I ran some timing tests on both and it appears that .difference() has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.我对两者都进行了一些计时测试,看起来.difference()有轻微的优势,达到 10-15%,但每种方法大约需要八分之一秒来过滤 1M 项(500 到 100,000 之间的随机整数) ,所以除非你对时间非常敏感,否则它可能并不重要。

Other Notes其他注意事项

It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. OP 似乎正在寻找一种解决方案,该解决方案提供两个单独的列表(或集合) - 第一个列表包含不在第二个列表中的项目,反之亦然。 Most of the previous answers return a single list or set that include all of the items.之前的大多数答案都会返回一个包含所有项目的列表或集合。

There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.还有一个问题是第一个列表中可能重复的项目应该多次计算还是只计算一次。

If the OP wants to maintain duplicates, a list comprehension could be used, for example:如果 OP 想要维护重复项,则可以使用列表理解,例如:

one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]

...which is roughly the same solution as posed in the original question, only a little cleaner. ...这与原始问题中提出的解决方案大致相同,只是更干净一点。 This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.此方法将保留原始列表中的重复项,但对于较大的数据集来说速度相当慢(如多个数量级)。

You can use Intersection concept to deal with this kind of problems.您可以使用交集概念来处理此类问题。

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}

Best thing about using this code is it works pretty fast for large data also.使用此代码的最大好处是它对大数据也能非常快速地工作。 I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.当我使用此逻辑时,我的 b1 有 607139 个元素,b2 有 296029 个元素,我在 2.9 秒内得到结果。

You can use the .__xor__ attribute method.您可以使用.__xor__属性方法。

set([1,2,3,4]).__xor__(set([2,3,5]))

or要么

a = set([1,2,3,4])
b = set([2,3,5])
a.__xor__(b)

You can use symmetric_difference command您可以使用 symmetric_difference 命令

x = {1,2,3} y = {2,3,4} x = {1,2,3} y = {2,3,4}

z = set.difference(x,y) z = set.difference(x,y)

Output will be: z = {1,4}输出将是:z = {1,4}

This should get the common and remaining elements这应该得到共同的和剩余的元素

lis1=[1,2,3,4,5,6,2,3,1]
lis2=[4,5,8,7,10,6,9,8]

common = list(dict.fromkeys([l1 for l1 in lis1 if l1 in lis2]))
remaining = list(filter(lambda i: i not in common, lis1+lis2))

common = [4, 5, 6]

remaining = [1, 2, 3, 2, 3, 1, 8, 7, 10, 9, 8]

All the good solutions, starting from basic DSA style to using inbuilt functions:所有好的解决方案,从基本的 DSA 风格到使用内置函数:

# Time: O(2n)
def solution1(arr1, arr2):
  map = {}
  maxLength = max(len(arr1), len(arr2))
  for i in range(maxLength):
    if(arr1[i]):
      if(not map.get(arr1[i])):
        map[arr1[i]] = [True, False]
      else:
        map[arr1[i]][0] = True
    if(arr2[i]):
      if(not map.get(arr2[i])):
        map[arr2[i]] = [False, True]
      else:
        map[arr2[i]][1] = False

  res = [];
  for key, value in map.items():
    if(value[0] == False or value[1] == False):
      res.append(key)

  return res

def solution2(arr1, arr2):
  return set(arr1) ^ set(arr2)

def solution3(arr1, arr2):
  return (set(arr1).difference(arr2), set(arr2).difference(arr1))

def solution4(arr1, arr2):
  return set(arr1).__xor__(set(arr2))

print(solution1([1,2,3], [2,4,6]))
print(solution2([1,2,3], [2,4,6]))
print(solution3([1,2,3], [2,4,6]))
print(solution4([1,2,3], [2,4,6]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM