简体   繁体   中英

Python Check item exists in line while iterating through list

I am trying to loop through two lists and only want to print an item if it DOES exist in the second list. I will be doing this through very large files so do not want to store them in memory like a list or dictionary. Is there a way I can do this without storing into a list or dict?

I am able to do the below to confirm they are NOT in the list but unsure why it is not working when I am trying confirm they ARE in the list by removing the "not".

Code to verify item DOES NOT exist in list_2.

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

Code to verify item DOES exist in list_2.

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

This is a solution using pandas.read_csv to create memory mapped files:

import pandas as pd

list1 = pd.read_csv('list1.txt', dtype=str, header=None, memory_map=True)
list2 = pd.read_csv('list2.txt', dtype=str, header=None, memory_map=True)

exists = pd.merge(list1, list2, how='inner', on=0)
for fruit in exists[0].tolist():
    print fruit

The list1.txt and list2.txt files contain the strings from the question with one string on each line.

Output

pear
kiwi

I do not have any really large files to experiment with, so I do not have any performance measurements.

So this is how you get them:

exists = [item for item in list_1 if item in list_2]
does_not_exist = [item for item in list_1 if item not in list_2]

And to print them:

for item in exists:
    print item
for item in does_not_exist:
    print item

But if you want to just print:

for item in list_1:
    if item in list_2:
        print item

You can use python's sets to work out the items in both lists

set(list1).intersection(set(list_2))

See https://docs.python.org/2/library/sets.html

I was able to accomplish the inverse by doing a True/False evaluation.

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

# DOES exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is False:
        print(fruit_1)

print('\n')

# DOES NOT exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is True:
        print(fruit_1)

I recommend pandas which works well on large-scale data.

Use pip to install it:

pip install pandas

And in a way you may achieve it like this:

import pandas as pd

s1 = pd.Index(list_1)
s2 = pd.Index(list_2)

exists = s1.intersection(s2)
does_not_exist = s1.difference(s2)

And now you would see the magic things if you execute print exists

See Pandas Docs

The problem the code is how the all() function is being evaluated. To break it down a bit more simply.

## DOES EXIST
print all('kiwi' in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' in fruit_2 for fruit_2 in ['pear', 'kiwi'])

Evaluates to

False
False

Inversely if you do something like this

#DOES NOT EXIST
print all('apple' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])

Evaluates to

True
False

I can not pinpoint why this is the cause, but it may be how the all() function returns true if all elements of the iterable are true and false otherwise.

In any case I think using any() instead of all() for the DOES exist part would work.

print "DOES NOT EXIST"
for fruit_1 in list_1:
    # print all(fruit_1 not in fruit_2 for fruit_2 in list_2)
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

print "\nDOES EXIST"
for fruit_1 in list_1:
    if any(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

DOES NOT EXIST
apple
orange
strawberry
banana

DOES EXIST
pear
kiwi

One issue with your code is that the all method returns false if any single check returns false . Another is that the fruit_1 in fruit_2 section is checking to see if fruit_1 is a substring of fruit_2 . If we were to modify the lists to make your logic work they would look like:

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['grape',
          'grape',
          'grape',
          'grape',
          'grape']

but could be:

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry']

since berry is in strawberry . If we were to continue to use iteration to make this check, as opposed to an intersection of sets, as @wrdeman suggested , then, using the dataset you provided, it would look like this:

for fruit_1 in list_1:
    if fruit_1 in list_2:
        print(fruit)

The other modification could be to change all to any , which returns true if any of the iterables items return true . Then your code would look like:

for fruit_1 in list_1:
    if any(fruit_1 == fruit_2 for fruit_2 in list_2):
        print(fruit_1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM