简体   繁体   中英

Comparing lists and extracting unique values

I have two lists:

l1: 38510 entries l2: 6384 entries

I want to extract only values, which are present in both lists.

So far that was my approach:

equals = []

for quote in l2:
   for quote2 in l1:
      if quote == quote2:
         equals.append(quote)

len(equals)) = 4999
len(set(equals))) = 4452

First of all, I have the feeling this approach is pretty inefficient, because I am checking every value in l1 several times..

Furthermore, it seems that I get still duplicates. Is this due to the inner-loop for l1?

Thank you!!

You can use list comprehension and the in operator.

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [2, 4, 6, 8, 0]

[x for x in a if x in b]
#[2, 4, 6, 8]

You were on the right track by using sets. One of set's coolest features is that you can get the intersection between two sets. An intersection is another way to say the values that occur in both sets. You can read about it more in the docs

Here is my example:

l1_set = set(l1)
l2_set = set(l2)

equals = l1_set & l2_set

#If you really want it as a list
equals = list(equals)

print(equals)

The & operator tells python to return a new set that only has values in both sets. At the end, I went ahead and converted equals back to a list because that's what your original example wanted. You can omit that if you don't need it.

1. This is the simplest method where we haven't used any built-in functions.

# Two lists in most simple way of showing the intersection
def intersection(list_one, list_two):
    temp_list = [value for value in list_one if value in list_two]
    return temp_list
  
# Illustrate the intersection
list_one = [4, 9, 1, 17, 11, 26, 28, 54, 69]
list_two = [9, 9, 74, 21, 45, 11, 63, 28, 26]
print(intersection(list_one, list_two))

# [123, 3, 23, 15]

2. You can use the python set() method.

# Two lists using set() method
def intersection(list_one, list_two):
    return list(set(list_one) & set(list_two))
  
# Illustrate the intersection
list_one = [15, 13, 123, 23, 31, 10, 3, 311, 738, 25, 124, 19]
list_two = [12, 14, 1,  15, 36, 123, 23, 3, 315, 87]
print(intersection(list_one, list_two))

# [123, 3, 23, 15]

3. In this technique, we can use the built-in function called intersection() to compute the intersected list.

First, we need to use set() for a larger list then compute the intersection.

# Two lists using set() and intersection()
def intersection_list(list_one, list_two):
    return list(set(list_one).intersection(list_two))
      
# Illustrate the intersection
list_one = [15, 13, 123, 23, 31, 10, 3, 311, 738, 25, 124, 19]
list_two = [12, 14, 1,  15, 36, 123, 23, 3, 315, 87, 978, 4, 13, 19, 20, 11]

if len(list_one) < len(list_two):
    list_one, list_two = list_two, list_one
    
print(intersection_list(list_one, list_two))

# [3, 13, 15, 19, 23, 123]

Additional you can follow the bellow tutorials

  1. Geeksforgeeks
  2. docs.python.org
  3. LearnCodingFast

Let's assume that all the entries in both of your lists are integers. If so, computing the intersection between the 2 lists would be more efficient than using list comprehension:

import timeit
l1 = [i for i in range(0, 38510)]
l2 = [i for i in range(0, 6384)]

st1 = timeit.default_timer()
# Using list comprehension
l3 = [i for i in l1 if i in l2]
ed1 = timeit.default_timer()

# Using set
st2 = timeit.default_timer()
l4 = list(set(l1) & set(l2))
ed2 = timeit.default_timer()

print(ed1-st1) # 5.7621682 secs
print(ed2-st2) # 0.004478600000000554 secs

As you have such long lists, you might want to use numpy which is specialized in providing efficient list processing for Python.

You can enjoy the fast processing with its numpy function. For your case, you can use numpy.intersect1d() to get the sorted, unique values that are in both of the input arrays, as follows:

import numpy as np

l1 = [1, 3, 5, 10, 11, 12]
l2 = [2, 3, 4, 10, 12, 14, 16, 18]

l_uniques = np.intersect1d(l1, l2)


print(l_uniques)

[ 3 10 12]

You can keep the resulting list as numpy array for further fast processing or further convert it back to Python list by:

l_uniques2 = l_uniques.tolist()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM