[英]Python : Compare two files
I have two input file: 我有两个输入文件:
scandinavian t airline airline 斯堪的纳维亚T航空公司
one n 0 flightnumber 一n 0航班号
six n 0 flightnumber 六n 0航班号
two n 0 flightnumber 2 n 0航班号
three n 0 flightnumber 三n 0航班号
speedbird t airline airline Speedbird T航空公司
one n 0 flightnumber 一n 0航班号
six n 0 flightnumber 六n 0航班号
eight n 0 flightnumber 八n 0航班号
My second input file: 我的第二个输入文件:
scandinavian t airline airli 斯堪的纳维亚T航空公司
one n 0 flightnumber 一n 0航班号
six n 0 flightnumber 六n 0航班号
two n 0 flightnumber 2 n 0航班号
three n 0 flightnumber 三n 0航班号
six n 0 flightnumber 六n 0航班号
eight n 0 flightnumber 八n 0航班号
I have the following code: 我有以下代码:
with open('output_ref.txt', 'r') as file1:
with open('output_ref1.txt', 'r') as file2:
same = set(file1).difference(file2)
print same
print "\n"
same.discard('\n')
with open('some_output_file.txt', 'w') as FO:
for line in same:
FO.write(line)
And I am getting output as: 我得到的输出为:
scandinavian t airline airline 斯堪的纳维亚T航空公司
speedbird t airline airline Speedbird T航空公司
But my actual output should be: 但是我的实际输出应该是:
scandinavian t airline airline 斯堪的纳维亚T航空公司
speedbird t airline airline Speedbird T航空公司
one n 0 flightnumber 一n 0航班号
Can someone help me in solving the issue?? 有人可以帮助我解决问题吗?
First of all, if what you are trying to do is get the common lines from 2 file (which the "same" variable name suggests) , then you should use the intersection method instead of difference . 首先,如果您要从2文件中获取公共行(“相同”变量名称建议),那么您应该使用交集方法而不是difference。 Also , both these methods are stated to require sets as their arguments so i would go the extra step and turn the second file into a set too .
而且,这两种方法都被声明为需要集合作为它们的参数,因此我将采取额外的步骤并将第二个文件也变成集合。 So the new code should be:
因此,新代码应为:
first = set(file1)
second = set(file2)
same = first.intersection(second)
..... .....
EDIT : 编辑:
reading some comments to my post convinced me that you actually want the difference and not on sets, but on lists . 阅读对我的帖子的一些评论使我确信,您实际上想要的是差异,而不是集结,而是列表。 I guess this should work for you :
我想这应该为您工作:
difference = list(file1)
second = list(file2)
for line in second:
try:
first.remove(line)
except ValueError,e:
print e # alternately you could just pass here
def diff(a, b):
y = []
for x in a:
if x not in b:
y.append(x)
else:
b.remove(x)
return y
with open('output_ref.txt', 'r') as file1:
with open('output_ref1.txt', 'r') as file2:
same = diff(list(file1), list(file2))
print same
print "\n"
if '\n' in same:
same.remove('\n')
with open('some_output_file.txt', 'w') as FO:
for line in same:
FO.write(line)
$ python compare.py
['scandinavian t airline airline\n', 'speedbird t airline airline\n', 'one n 0 flightnumber\n']
$ cat some_output_file.txt
scandinavian t airline airline
speedbird t airline airline
one n 0 flightnumber
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.