I am trying to loop through two lists and only want to print an item if it DOES exist in the second list. I will be doing this through very large files so do not want to store them in memory like a list or dictionary. Is there a way I can do this without storing into a list or dict?
I am able to do the below to confirm they are NOT in the list but unsure why it is not working when I am trying confirm they ARE in the list by removing the "not".
Code to verify item DOES NOT exist in list_2.
list_1 = ['apple',
'pear',
'orange',
'kiwi',
'strawberry',
'banana']
list_2 = ['kiwi',
'melon',
'grape',
'pear']
for fruit_1 in list_1:
if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
print(fruit_1)
Code to verify item DOES exist in list_2.
list_1 = ['apple',
'pear',
'orange',
'kiwi',
'strawberry',
'banana']
list_2 = ['kiwi',
'melon',
'grape',
'pear']
for fruit_1 in list_1:
if all(fruit_1 in fruit_2 for fruit_2 in list_2):
print(fruit_1)
This is a solution using pandas.read_csv
to create memory mapped files:
import pandas as pd
list1 = pd.read_csv('list1.txt', dtype=str, header=None, memory_map=True)
list2 = pd.read_csv('list2.txt', dtype=str, header=None, memory_map=True)
exists = pd.merge(list1, list2, how='inner', on=0)
for fruit in exists[0].tolist():
print fruit
The list1.txt
and list2.txt
files contain the strings from the question with one string on each line.
Output
pear
kiwi
I do not have any really large files to experiment with, so I do not have any performance measurements.
So this is how you get them:
exists = [item for item in list_1 if item in list_2]
does_not_exist = [item for item in list_1 if item not in list_2]
And to print
them:
for item in exists:
print item
for item in does_not_exist:
print item
But if you want to just print:
for item in list_1:
if item in list_2:
print item
You can use python's sets to work out the items in both lists
set(list1).intersection(set(list_2))
I was able to accomplish the inverse by doing a True/False evaluation.
list_1 = ['apple',
'pear',
'orange',
'kiwi',
'strawberry',
'banana']
list_2 = ['kiwi',
'melon',
'grape',
'pear']
# DOES exist
for fruit_1 in list_1:
if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is False:
print(fruit_1)
print('\n')
# DOES NOT exist
for fruit_1 in list_1:
if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is True:
print(fruit_1)
I recommend pandas
which works well on large-scale data.
Use pip to install it:
pip install pandas
And in a way you may achieve it like this:
import pandas as pd
s1 = pd.Index(list_1)
s2 = pd.Index(list_2)
exists = s1.intersection(s2)
does_not_exist = s1.difference(s2)
And now you would see the magic things if you execute print exists
See Pandas Docs
The problem the code is how the all() function is being evaluated. To break it down a bit more simply.
## DOES EXIST
print all('kiwi' in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' in fruit_2 for fruit_2 in ['pear', 'kiwi'])
Evaluates to
False
False
Inversely if you do something like this
#DOES NOT EXIST
print all('apple' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])
Evaluates to
True
False
I can not pinpoint why this is the cause, but it may be how the all() function returns true if all elements of the iterable are true and false otherwise.
In any case I think using any() instead of all() for the DOES exist part would work.
print "DOES NOT EXIST"
for fruit_1 in list_1:
# print all(fruit_1 not in fruit_2 for fruit_2 in list_2)
if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
print(fruit_1)
print "\nDOES EXIST"
for fruit_1 in list_1:
if any(fruit_1 in fruit_2 for fruit_2 in list_2):
print(fruit_1)
DOES NOT EXIST
apple
orange
strawberry
banana
DOES EXIST
pear
kiwi
One issue with your code is that the all method returns false if any single check returns false . Another is that the fruit_1 in fruit_2
section is checking to see if fruit_1
is a substring of fruit_2
. If we were to modify the lists to make your logic work they would look like:
list_1 = ['apple',
'pear',
'orange',
'kiwi',
'berry',
'banana',
'grape']
list_2 = ['grape',
'grape',
'grape',
'grape',
'grape']
but could be:
list_1 = ['apple',
'pear',
'orange',
'kiwi',
'berry',
'banana',
'grape']
list_2 = ['strawberry',
'strawberry',
'strawberry',
'strawberry',
'strawberry',
'strawberry']
since berry
is in strawberry
. If we were to continue to use iteration to make this check, as opposed to an intersection of sets, as @wrdeman suggested , then, using the dataset you provided, it would look like this:
for fruit_1 in list_1:
if fruit_1 in list_2:
print(fruit)
The other modification could be to change all
to any
, which returns true if any of the iterables items return true . Then your code would look like:
for fruit_1 in list_1:
if any(fruit_1 == fruit_2 for fruit_2 in list_2):
print(fruit_1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.