[英]Compare lines in two .txt files, print out new line for not contained words
I have the following piece of code that, for every line in textfile1, searches textfile2 and if the line is contained in textfile2 prints out the corresponding line of textfile2. 我有以下代码,对于textfile1中的每一行,搜索textfile2,如果该行包含在textfile2中,则会打印出textfile2的相应行。 I want to however, print out new line for every line not contained in textfile2.
但是,我想为textfile2中未包含的每一行打印出新行。 Here is the code:
这是代码:
def readline():
with open("textfile1.txt") as file, open("textfile2.txt") as file2:
string = set(map(str.rstrip,file))
for line in file2:
spl = line.split(None, 1)[0]
if spl in string:
print(line.rstrip())
else: ##if spl not in string print new line
print("\n")
It doesn't work as I expect (doesn't print out any new lines), what may be the problem or any alternative solutions? 它没有按我期望的那样工作(不打印任何新行),可能是什么问题或任何替代解决方案?
Sample Textfile1: 样本文本文件1:
'
a
aa
ab
abandon
abandonaudiofocus
abandonsession
abort
abortablehttprequest
abortanimation
abortcaptures
abortconnection
abortpolicy
abortrequest
abs
Sample Textfile2: 样本文本文件2:
' | 22624
a | 91
aa | 7
ab | 6
abort | 8
abortanimation | 5
abs | 131
abslistview | 115
absolutelayout | 50
absolutesizespan | 6
abstracthttpentity | 2
abstractlist | 1
abstractmap | 4
abstractselector | 1
abstractset | 2
Textfile1 includes many more words and it contains all the words in textfile2. Textfile1包含更多单词,并且包含textfile2中的所有单词。
For every line in textfile2
, searches first part of it in textfile1
and if the line is contained in textfile2
prints out the corresponding line of textfile2
. 对于每一行
textfile2
,搜索它的第一部分在textfile1
并且如果线被包含在textfile2
打印出的对应线textfile2
。
def readline():
file1_list = [line.rstrip() for line in open("textfile1.txt")]
file2_list = [line.rstrip() for line in open("textfile2.txt")]
fileo_list = [line if line.split(None, 1)[0] in file1_list else '' for line in file2_list]
for line in fileo_list:
print(line)
This will print out: 这将打印出:
' | 22624
a | 91
aa | 7
ab | 6
abort | 8
abortanimation | 5
abs | 131
.....
According to your question - 根据您的问题-
for every line in textfile1, searches textfile2 and if the line is contained in textfile2 prints out the corresponding line of textfile2
对于textfile1中的每一行,搜索textfile2,如果该行包含在textfile2中,则打印出textfile2的相应行
And comment - 并发表评论-
Textfile1 includes many more words and it contains all the words in textfile2
Textfile1包含更多单词,并且包含textfile2中的所有单词
The logic you have right now if actually opposite, it checks for each line in file2
- textfile2.txt
- whether that line's first part exists in the file
- textfile1.txt
- which would always be true, according to your comment. 您现在拥有的逻辑(如果实际上相反)将检查
file2
每一行textfile2.txt
该行的第一部分是否存在于file
textfile1.txt
根据您的评论,该行始终为真。
You need to get all elements (first part of each line) of file2 in the set and then check each line of file
. 您需要获取集合中file2的所有元素(每行的第一部分),然后检查
file
每一行。 Example - 范例-
def get_first(line):
return line.split(None, 1)[0]
def readline():
with open("textfile1.txt",'r') as file, open("textfile2.txt",'r') as file2:
string = set(map(get_first,file2))
file2.seek(0)
file2_dict = {}
for line in file2:
file2_dict[line.split(None, 1)[0]] = line
for line in file:
if line.strip() in string:
print(file2_dict[line.rstrip()])
else: ##if spl not in string print new line
print()
Also, you do not need "\\n"
inside your print()
in else part, print also puts a newline by itself , you can just do - print()
to print a newline. 另外,在
print()
的其他部分不需要"\\n"
,print本身也会放置换行符,您只需执行print()
即可打印换行符。
Example/Demo - 示例/演示-
>>> def get_first(line):
... return line.split(None, 1)[0]
...
>>> def readline():
... with open("a.txt",'r') as file, open("b.txt",'r') as file2:
... string = set(map(get_first,file2))
... for line in file:
... if line.strip() in string:
... print(line.rstrip())
... else: ##if spl not in string print new line
... print()
...
>>> readline()
a
aa
ab
abort
abortanimation
abs
In the above example, a.txt
contains data from your example textfile1.txt
and b.txt
contains data from example of textfile2.txt
. 在上面的示例中,
a.txt
包含来自示例textfile1.txt
数据, b.txt
包含来自textfile2.txt
示例的数据。
Sets make this pretty easy 套装使这个变得非常容易
with open("textfile1.txt") as file1:
textfile_1_set = set(map(str.rstrip, file1))
with open("textfile2.txt") as file2:
textfile_2_set = set([l.split()[0] for l in file2])
# remove all the lines that are in textfile2 from the
# set of lines from textfile1
in_1_but_not_2 = textfile_1_set - textfile_2_set
for line in in_1_but_not_2:
print line
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.