简体   繁体   English

比较列表并提取唯一值

[英]Comparing lists and extracting unique values

I have two lists:我有两个清单:

l1: 38510 entries l2: 6384 entries l1:38510 个条目 l2:6384 个条目

I want to extract only values, which are present in both lists.我只想提取两个列表中都存在的值。

So far that was my approach:到目前为止,这是我的方法:

equals = []

for quote in l2:
   for quote2 in l1:
      if quote == quote2:
         equals.append(quote)

len(equals)) = 4999
len(set(equals))) = 4452

First of all, I have the feeling this approach is pretty inefficient, because I am checking every value in l1 several times..首先,我觉得这种方法效率很低,因为我要检查 l1 中的每个值几次..

Furthermore, it seems that I get still duplicates.此外,似乎我仍然得到重复。 Is this due to the inner-loop for l1?这是由于 l1 的内循环吗?

Thank you!!谢谢!!

You can use list comprehension and the in operator.您可以使用list comprehensionin运算符。

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 8, 0]

[x for x in a if x in b]
#[2, 4, 6, 8]

You were on the right track by using sets.通过使用集合,您走在了正确的轨道上。 One of set's coolest features is that you can get the intersection between two sets. set 最酷的功能之一是您可以获得两组之间的交集 An intersection is another way to say the values that occur in both sets.交集是另一种表示出现在两个集合中的值的方式。 You can read about it more in the docs您可以在文档中了解更多信息

Here is my example:这是我的例子:

l1_set = set(l1)
l2_set = set(l2)

equals = l1_set & l2_set

#If you really want it as a list
equals = list(equals)

print(equals)

The & operator tells python to return a new set that only has values in both sets. &运算符告诉 python 返回一个只有两个集合中的值的新集合。 At the end, I went ahead and converted equals back to a list because that's what your original example wanted.最后,我继续将 equals 转换回列表,因为这就是您最初的示例想要的。 You can omit that if you don't need it.如果你不需要它,你可以省略它。

1. This is the simplest method where we haven't used any built-in functions. 1. 这是最简单的方法,我们没有使用任何内置函数。

# Two lists in most simple way of showing the intersection
def intersection(list_one, list_two):
    temp_list = [value for value in list_one if value in list_two]
    return temp_list
  
# Illustrate the intersection
list_one = [4, 9, 1, 17, 11, 26, 28, 54, 69]
list_two = [9, 9, 74, 21, 45, 11, 63, 28, 26]
print(intersection(list_one, list_two))

# [123, 3, 23, 15]

2. You can use the python set() method. 2. 可以使用 python set()方法。

# Two lists using set() method
def intersection(list_one, list_two):
    return list(set(list_one) & set(list_two))
  
# Illustrate the intersection
list_one = [15, 13, 123, 23, 31, 10, 3, 311, 738, 25, 124, 19]
list_two = [12, 14, 1,  15, 36, 123, 23, 3, 315, 87]
print(intersection(list_one, list_two))

# [123, 3, 23, 15]

3. In this technique, we can use the built-in function called intersection() to compute the intersected list. 3. 在这种技术中,我们可以使用内置的 function 调用intersection()来计算相交列表。

First, we need to use set() for a larger list then compute the intersection.首先,我们需要使用set()来获取更大的列表,然后计算交集。

# Two lists using set() and intersection()
def intersection_list(list_one, list_two):
    return list(set(list_one).intersection(list_two))
      
# Illustrate the intersection
list_one = [15, 13, 123, 23, 31, 10, 3, 311, 738, 25, 124, 19]
list_two = [12, 14, 1,  15, 36, 123, 23, 3, 315, 87, 978, 4, 13, 19, 20, 11]

if len(list_one) < len(list_two):
    list_one, list_two = list_two, list_one
    
print(intersection_list(list_one, list_two))

# [3, 13, 15, 19, 23, 123]

Additional you can follow the bellow tutorials另外,您可以按照以下教程进行操作

  1. Geeksforgeeks极客们
  2. docs.python.org docs.python.org
  3. LearnCodingFast 快速学习编码

Let's assume that all the entries in both of your lists are integers.假设您的两个列表中的所有条目都是整数。 If so, computing the intersection between the 2 lists would be more efficient than using list comprehension:如果是这样,计算两个列表之间的交集将比使用列表推导更有效:

import timeit
l1 = [i for i in range(0, 38510)]
l2 = [i for i in range(0, 6384)]

st1 = timeit.default_timer()
# Using list comprehension
l3 = [i for i in l1 if i in l2]
ed1 = timeit.default_timer()

# Using set
st2 = timeit.default_timer()
l4 = list(set(l1) & set(l2))
ed2 = timeit.default_timer()

print(ed1-st1) # 5.7621682 secs
print(ed2-st2) # 0.004478600000000554 secs

As you have such long lists, you might want to use numpy which is specialized in providing efficient list processing for Python.由于您有这么长的列表,您可能想要使用numpy ,它专门为 Python 提供有效的列表处理。

You can enjoy the fast processing with its numpy function.您可以使用它的 numpy function 享受快速处理。 For your case, you can use numpy.intersect1d() to get the sorted, unique values that are in both of the input arrays, as follows:对于您的情况,您可以使用numpy.intersect1d()来获取输入 arrays 中的排序的唯一值,如下所示:

import numpy as np

l1 = [1, 3, 5, 10, 11, 12]
l2 = [2, 3, 4, 10, 12, 14, 16, 18]

l_uniques = np.intersect1d(l1, l2)


print(l_uniques)

[ 3 10 12]

You can keep the resulting list as numpy array for further fast processing or further convert it back to Python list by:您可以将结果列表保留为 numpy 数组,以便进一步快速处理或通过以下方式将其进一步转换回 Python 列表:

l_uniques2 = l_uniques.tolist()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM