[英]Is there a way of looking for words from one file in another file and outputting the words not found in the other file, in a new file?
I am trying to compare two files in Python, which both contain some words. 我正在尝试比较Python中的两个文件,两个文件都包含一些单词。 I would like the code to look for words from file1 in file2 and put the words that are not found from file1 in a new file as an output.
我希望代码从file2中的file1中查找单词,并将从file1中找不到的单词放入新文件中作为输出。
The code below is what I've tried, but it doesn't do anything. 下面的代码是我尝试过的,但没有执行任何操作。 It doesn't even show an error, so I don't know what goes wrong or should be different.
它甚至没有显示错误,所以我不知道出了什么问题或应该有所不同。
file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt')
file2 = open('C:/Users/Atal/Desktop/School/Project datas/file2.txt')
fileContent = file1.read();
fileContent2 = file2.read();
loglist = file1.readlines()
loglist2 = file2.readlines()
file2.close()
line = file1.readline()
file1.close()
found = False
for line in loglist:
if line in loglist2 :
found = True
if not found:
file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt', 'w')
file1.write(line +"\n")
file1.close()
file1 looks like this: Peter Jan Richard file1看起来像这样:Peter Jan Richard
file2 looks like this: Floyd Richard Bob file2看起来像这样:Floyd Richard Bob
The new file should look like this: Peter Jan 新文件应如下所示:Peter Jan
If there is any way to do this, please let me know. 如果有任何方法可以解决,请告诉我。 Thanks in advance.
提前致谢。
Use set and not in like so: 使用set而不是这样:
list_1 = ['Peter', 'Jan', 'Richard']
list_2 = ['Floyd', 'Richard', 'Bob']
set_2 = set(list_2)
main_list = [item for item in list_1 if item not in set_2]
main_list
Output: 输出:
['Peter', 'Jan']
When writing code, you need to keep in mind exactly what you're expecting each variable to contain at every step of your program's execution. 在编写代码时,您需要牢记要在程序执行的每个步骤中期望每个变量包含的内容。 For example, this:
例如,这:
loglist = file1.readlines()
...
line = file1.readline()
...
for line in loglist:
why do that middle statement at all, if you're just going to overwrite line
immediately? 如果您只是要立即覆盖
line
,为什么要完全忽略该中间语句? And within your for
loop: 在您的
for
循环中:
for line in loglist:
if line in loglist2:
found = True
if not found:
# save new file
So, if a line from loglist
is found in loglist2
, then set the variable found
to True
. 因此,如果在
loglist
中找到了loglist2
中的loglist2
,则将found
的变量设置为True
。 And if that didn't happen (if found
remains False
) then output to file1. 如果没有发生 (如果
found
仍然为False
),则输出到file1。 Note here that you're not doing anything else with line
, and even if you were, the line file1.write(line +"\\n")
only ever outputs one line and never repeats with other lines (or so I surmise from the way you indented your code in your question). 请注意,在这里您不会对
line
进行任何其他操作,即使您这样做, file1.write(line +"\\n")
行也只会输出一行,而不会与其他行重复(所以我推测您在问题中缩进代码的方式)。
So, here's how you would do this more correctly. 因此,这是您将更正确地执行此操作的方法。 As you read through this, pay attention to what type (string, list, etc.) each variable is whenever it's used:
阅读此书时,请注意每个变量在使用时的类型(字符串,列表等):
with open(".../file1.txt", "r") as file1, open(".../file2.txt", "r") as file2:
logList1 = file1.readlines()
logList2 = file2.readlines()
# the with block will close the files automatically
for line in logList1:
if line in logList2:
logList2.remove(line) # if the line from file1 is found in file2, remove that line from file2
with open(".../file3.txt", "w") as file3:
file3.writelines(logList2) # write the contents of file2, after we removed lines from file1 from it
@johnny1995, in his answer, did the middle step in a list comprehension: @ johnny1995,在他的回答中,在列表理解中采取了中间步骤:
logList3 = [line for line in logList2 if line not in logList1]
which is essentially shorthand for what I did above: "make a new list containing every line from logList2
, but only if that line doesn't appear in logList1
". 这基本上是我上面所做的简写:“创建一个新列表,其中包含
logList2
中的每一行,但仅当该行未出现在logList1
”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.