比较2个txt文件并根据1个或文件中不存在的内容创建一个新的txt文件

Question

我有2个txt文件，一个是水果列表，另一个是许多数据列表，其中每行都嵌入一个水果，如下所示：

文件1：

apple
orange
grape
banana
pear

文件2

Brian b7890 apple orchard autumn
Sue c7623 grape vineyard summer
Richard z4501 grapefruit citrusGrove autumn
Mary m8123 pear orchard autumn

我需要从文件2中提取相应的水果不出现在文件1中的行，并将其写入新的文本文件。 在文件2的此示例中，唯一符合条件的行是：

Richard z4501 grapefruit citrusGrove autumn

请注意：我已经给出了这个示例，因为文件1中出现的“葡萄”一词是葡萄柚一词的一部分，其中排除了一些更简单的提取方法。

我首先将每个文件的每一行放入列表中：

f = open('ListOfFruits.txt')
listOfFruits = [line.strip() for line in open('ListOfFruits.txt')]
f.close()

a = open('AllFruitData.txt')
AllFruitData = [line.strip() for line in open('AllFruitData.txt')]
a.close()

i=0
x=0

while x < len(listOfFruits):
    if listOfFruits[i] not in allFruitData[x]:
        i=i+1
        #then check against allFruitData again
        #continue until the end of listOfFruits
        #if no match is found then add the line allFruitData[x] to a new txt file
   x=x+1

我已经尝试过使用for循环，while循环和if语句的各种方法，但是似乎总是卡在语法上。 我试图想象代码在2个轮子相互旋转的同时工作，而1个是静止的，另一个一直旋转直到找到匹配项。 如果找到匹配项，则固定轮将在1个位置移动，而移动轮将重置。 如果固定轮在动轮上找不到匹配项，则该数据将进入一个新的篮子。 这将继续进行，直到固定轮上的所有位置都已被移动轮所遍历。

Answer 1

怎么样使用set 。 然后，您可以使用集合差异。 一个简单的实现可以是（如果水果始终位于第二个文件的每一行的第三位）

with open('listOfFruits.txt', 'r') as f:
    fruits = set([line.rstrip() for line in f])
with open('allFruitData.txt', 'r') as f:
    data = {}
    for line in f:
        fruit = line.rstrip().split()[2]
        data[fruit] = line

fruits_not_in_file = set(data.keys()) - fruits
with open(outfile, 'w') as f:
    for fruit in fruits_not_in_lile:
        f.write(data[fruit])

编辑：

如果水果可以出现在任何列中，那么如果您不知道哪些单词是水果，则此问题会更加困难。 但是，如果您只想打印出不包含任何命名水果的行，那还不错：

with open('listOfFruits.txt', 'r') as f:
    fruits = set([line.rstrip() for line in f])

with open('outfile.txt', 'w') as outf, open('allFruitData.txt', 'r') as inf:
    for line in inf:
        words = set(line.rstrip().split())
        # you can replace this `if` with `if fruits & words == set()`
        if not fruits & words:
            outf.write(line)

首先将所有水果读为一组。 然后，对于数据文件中的每一行，我们测试该行中的任何单词是否在结果集中。 如果交集为空，则将该行打印到输出文件中。 如果该行在某处包含水果，则继续进行下一行。

请注意，这不会将'grape'与'grapefruit'匹配，因为它会根据您选择的分隔符（看起来像是space或\\t ）来分割行。

Answer 2

这应该可以完成工作（假设file2中的结果始终是第三项）：

def compare_fruits():

    files = ["file1", "file2"]
    file_list = []

    for file in files:
        with open("filepath/%s.txt", % file "r") as f:
            file_list.append(f.readlines())

    list1 = [i.strip() for i in file_list[0]]
    list2 = [i.split()[2] for i in file_list[1]]

    diff = []

    for i in list1:
        for j in list2:
            if j not in i:
                diff.append(j)

    with open("filepath/file3.txt", "w") as f: #creates file3 if doesn't exist
        for i in diff:
            f.write(i + '\n')

比较2个txt文件并根据1个或文件中不存在的内容创建一个新的txt文件

问题描述

2 个解决方案

解决方案1
0 已采纳 2013-12-03 17:12:09

解决方案2
0 2013-12-03 17:12:43

比较2个txt文件并根据1个或文件中不存在的内容创建一个新的txt文件

问题描述

2 个解决方案

解决方案1 0 已采纳 2013-12-03 17:12:09

解决方案2 0 2013-12-03 17:12:43

解决方案1
0 已采纳 2013-12-03 17:12:09

解决方案2
0 2013-12-03 17:12:43