简体   繁体   English

有没有一种方法可以从另一个文件中的一个文件中查找单词,并在新文件中输出在另一个文件中找不到的单词?

[英]Is there a way of looking for words from one file in another file and outputting the words not found in the other file, in a new file?

I am trying to compare two files in Python, which both contain some words. 我正在尝试比较Python中的两个文件,两个文件都包含一些单词。 I would like the code to look for words from file1 in file2 and put the words that are not found from file1 in a new file as an output. 我希望代码从file2中的file1中查找单词,并将从file1中找不到的单词放入新文件中作为输出。

The code below is what I've tried, but it doesn't do anything. 下面的代码是我尝试过的,但没有执行任何操作。 It doesn't even show an error, so I don't know what goes wrong or should be different. 它甚至没有显示错误,所以我不知道出了什么问题或应该有所不同。

file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt')
file2 = open('C:/Users/Atal/Desktop/School/Project datas/file2.txt')

fileContent = file1.read();
fileContent2 = file2.read();

loglist = file1.readlines()

loglist2 = file2.readlines()
file2.close()

line = file1.readline()
file1.close()

found = False
for line in loglist:
if line in loglist2 :
    found = True

if not found:
file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt', 'w')
file1.write(line +"\n")
file1.close()

file1 looks like this: Peter Jan Richard file1看起来像这样:Peter Jan Richard

file2 looks like this: Floyd Richard Bob file2看起来像这样:Floyd Richard Bob

The new file should look like this: Peter Jan 新文件应如下所示:Peter Jan

If there is any way to do this, please let me know. 如果有任何方法可以解决,请告诉我。 Thanks in advance. 提前致谢。

Use set and not in like so: 使用set而不是这样:

list_1 = ['Peter', 'Jan', 'Richard']
list_2 = ['Floyd', 'Richard', 'Bob'] 

set_2 = set(list_2)  
main_list = [item for item in list_1 if item not in set_2]

main_list

Output: 输出:

['Peter', 'Jan']

When writing code, you need to keep in mind exactly what you're expecting each variable to contain at every step of your program's execution. 在编写代码时,您需要牢记要在程序执行的每个步骤中期望每个变量包含的内容。 For example, this: 例如,这:

loglist = file1.readlines()
...
line = file1.readline()
...
for line in loglist:

why do that middle statement at all, if you're just going to overwrite line immediately? 如果您只是要立即覆盖line ,为什么要完全忽略该中间语句? And within your for loop: 在您的for循环中:

for line in loglist:
    if line in loglist2:
        found = True

if not found:
    # save new file

So, if a line from loglist is found in loglist2 , then set the variable found to True . 因此,如果在loglist中找到了loglist2中的loglist2 ,则将found的变量设置为True And if that didn't happen (if found remains False ) then output to file1. 如果没有发生 (如果found仍然为False ),则输出到file1。 Note here that you're not doing anything else with line , and even if you were, the line file1.write(line +"\\n") only ever outputs one line and never repeats with other lines (or so I surmise from the way you indented your code in your question). 请注意,在这里您不会对line进行任何其他操作,即使您这样做, file1.write(line +"\\n")行也只会输出一行,而不会与其他行重复(所以我推测您在问题中缩进代码的方式)。


So, here's how you would do this more correctly. 因此,这是您将更正确地执行此操作的方法。 As you read through this, pay attention to what type (string, list, etc.) each variable is whenever it's used: 阅读此书时,请注意每个变量在使用时的类型(字符串,列表等):

with open(".../file1.txt", "r") as file1, open(".../file2.txt", "r") as file2:
    logList1 = file1.readlines()
    logList2 = file2.readlines()
    # the with block will close the files automatically

for line in logList1:
    if line in logList2: 
        logList2.remove(line)  # if the line from file1 is found in file2, remove that line from file2

with open(".../file3.txt", "w") as file3:
    file3.writelines(logList2)  # write the contents of file2, after we removed lines from file1 from it

@johnny1995, in his answer, did the middle step in a list comprehension: @ johnny1995,在他的回答中,在列表理解中采取了中间步骤:

logList3 = [line for line in logList2 if line not in logList1]

which is essentially shorthand for what I did above: "make a new list containing every line from logList2 , but only if that line doesn't appear in logList1 ". 这基本上是我上面所做的简写:“创建一个新列表,其中包含logList2中的每一行,但仅当该行未出现在logList1 ”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM