[英]Using Python to Compare Two Text Files Line by Line
I have two text files that I want to compare. 我有两个我要比较的文本文件。 First file contains unique items, and the second file contains same items but repeated numerous times.
第一个文件包含唯一的项目,第二个文件包含相同的项目但重复多次。 I want to see how many times each line is repeated in the second file.
我想看看第二个文件中每行重复多少次。 This is what I wrote:
这就是我写的:
import os
import sys
f1 = open('file1.txt') # this has the 27 unique lines,
f1data = f1.readlines()
f2 = open('file2.txt') # this has lines repeated various times, with a total of 11162 lines
f2data = f2.readlines()
sys.stdout = open("linecount.txt", "w")
for line1 in f1data:
linecount = 0
for line2 in f2data:
if line1 in line2:
linecount+=1
print line2, crime
The problem is, when I add up the line count result it returns 11586, instead of 11162. What is the reason for this increase in the line count? 问题是,当我将行计数结果加起来时,它返回11586,而不是11162.这个行计数增加的原因是什么?
Is there another way of getting a line frequency output using Python? 有没有其他方法可以使用Python获得线路频率输出?
https://docs.python.org/2.7/reference/expressions.html#in : https://docs.python.org/2.7/reference/expressions.html#in :
For the Unicode and string types,
x in y
is true if and only if x is a substring of y .对于Unicode和字符串类型,当且仅当x是y的子字符串时,
x in y
为真。
Instead of 代替
if line1 in line2:
I think you meant to write 我想你打算写
if line1 == line2:
Or maybe replace the whole 或者可以替换整个
for line2 in f2data:
if line1 in line2:
linecount+=1
block by 阻止
if line1 in f2data:
linecount += 1
it is not working even if we change the code a bit. 即使我们稍微更改了代码,它也无法正常工作。 I got some better results from this code.
我从这段代码中得到了更好的结果。
>> import os
>> import sys
>> f1 = open('hmd4.csv')
>> f2 = open('svm_words.txt')
>> linecount = 0
>> for word1 in f1.read().split("."):
>> for word2 in f2.read().split("\n"):
>> if word1 in word2:
>> linecount+=1
>> print (linecount)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.