遍历列表时，Python Check项目存在于行中

Question

我试图遍历两个列表，并且只想打印第二个列表中确实存在的项目。 我将通过非常大的文件来执行此操作，因此不想将它们像列表或字典一样存储在内存中。 有没有一种方法可以在不存储到列表或字典的情况下执行此操作？

我可以执行以下操作以确认它们不在列表中，但是不确定为什么当我尝试通过删除“ not”来确认它们在列表中时不起作用。

用于验证项目的代码在list_2中不存在。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

验证项目DOES的代码存在于list_2中。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

for fruit_1 in list_1:
    if all(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

Answer 1

这是使用pandas.read_csv创建内存映射文件的解决方案：

import pandas as pd

list1 = pd.read_csv('list1.txt', dtype=str, header=None, memory_map=True)
list2 = pd.read_csv('list2.txt', dtype=str, header=None, memory_map=True)

exists = pd.merge(list1, list2, how='inner', on=0)
for fruit in exists[0].tolist():
    print fruit

list1.txt和list2.txt文件包含问题的字符串，每行一个字符串。

产量

pear
kiwi

我没有任何非常大的文件可以尝试，因此我没有任何性能指标。

Answer 2

因此，这就是您获得它们的方式：

exists = [item for item in list_1 if item in list_2]
does_not_exist = [item for item in list_1 if item not in list_2]

并print它们：

for item in exists:
    print item
for item in does_not_exist:
    print item

但是，如果您只想打印：

for item in list_1:
    if item in list_2:
        print item

Answer 3

您可以使用python的集合来计算两个列表中的项目

set(list1).intersection(set(list_2))

参见https://docs.python.org/2/library/sets.html

Answer 4

我能够通过做一个正确/错误评估来完成逆运算。

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'strawberry',
          'banana']

list_2 = ['kiwi',
          'melon',
          'grape',
          'pear']

# DOES exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is False:
        print(fruit_1)

print('\n')

# DOES NOT exist
for fruit_1 in list_1:
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is True:
        print(fruit_1)

Answer 5

我建议在大型数据上使用效果良好的pandas 。

使用pip安装它：

pip install pandas

并且您可以这样实现：

import pandas as pd

s1 = pd.Index(list_1)
s2 = pd.Index(list_2)

exists = s1.intersection(s2)
does_not_exist = s1.difference(s2)

现在，如果执行print exists您将看到神奇的事物

见熊猫文件

Answer 6

代码的问题是如何评估all（）函数。 要更简单地分解它。

## DOES EXIST
print all('kiwi' in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' in fruit_2 for fruit_2 in ['pear', 'kiwi'])

评估为

False
False

相反，如果您这样做

#DOES NOT EXIST
print all('apple' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])
print all('pear' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])

评估为

True
False

我无法查明这是为什么，但是如果iterable的所有元素都是true ，否则all（）函数如何返回true，否则可能是false。

无论如何，我认为将DOES存在的部分使用any（）代替all（）是可行的。

print "DOES NOT EXIST"
for fruit_1 in list_1:
    # print all(fruit_1 not in fruit_2 for fruit_2 in list_2)
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

print "\nDOES EXIST"
for fruit_1 in list_1:
    if any(fruit_1 in fruit_2 for fruit_2 in list_2):
        print(fruit_1)

DOES NOT EXIST
apple
orange
strawberry
banana

DOES EXIST
pear
kiwi

Answer 7

您的代码的一个问题是，如果任何单个检查返回false ，则all方法将返回false 。 另一个问题是， fruit_1 in fruit_2部分fruit_1 in fruit_2正在检查一下fruit_1是否为fruit_2的子字符串。 如果我们要修改列表以使您的逻辑正常工作，它们将类似于：

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['grape',
          'grape',
          'grape',
          'grape',
          'grape']

但可能是：

list_1 = ['apple',
          'pear',
          'orange',
          'kiwi',
          'berry',
          'banana',
          'grape']

list_2 = ['strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry',
          'strawberry']

因为strawberry berry 。 如果我们继续使用迭代进行检查，而不是像@wrdeman建议的那样使用集合交叉，则使用您提供的数据集，它看起来像这样：

for fruit_1 in list_1:
    if fruit_1 in list_2:
        print(fruit)

另一种修改是将all更改为any ，如果任何可迭代项返回true，则返回true 。 然后您的代码将如下所示：

for fruit_1 in list_1:
    if any(fruit_1 == fruit_2 for fruit_2 in list_2):
        print(fruit_1)

遍历列表时，Python Check项目存在于行中

问题描述

7 个解决方案

解决方案1
1 2017-04-26 17:23:36

解决方案2
0 2017-04-26 15:53:54

解决方案3
0 2017-04-26 15:54:14

解决方案4
0 2017-04-26 16:20:06

解决方案5
0 2017-04-26 16:21:43

解决方案6
0 2017-04-26 17:15:19

解决方案7
-1 2017-04-26 16:10:47

遍历列表时，Python Check项目存在于行中

问题描述

7 个解决方案

解决方案1 1 2017-04-26 17:23:36

解决方案2 0 2017-04-26 15:53:54

解决方案3 0 2017-04-26 15:54:14

解决方案4 0 2017-04-26 16:20:06

解决方案5 0 2017-04-26 16:21:43

解决方案6 0 2017-04-26 17:15:19

解决方案7 -1 2017-04-26 16:10:47

解决方案1
1 2017-04-26 17:23:36

解决方案2
0 2017-04-26 15:53:54

解决方案3
0 2017-04-26 15:54:14

解决方案4
0 2017-04-26 16:20:06

解决方案5
0 2017-04-26 16:21:43

解决方案6
0 2017-04-26 17:15:19

解决方案7
-1 2017-04-26 16:10:47