[英]Comparing tow large files line by line
I was wondering if there is any efficient way to compare 2 large files line by line. 我想知道是否有任何有效的方法逐行比较2个大文件。
File 1 文件1
2
3
2
File 2 文件2
2 | haha
3 | hoho
4 | hehe
I am just taking the first character of each file and comparing against them. 我只是将每个文件的第一个字符与它们进行比较。 Currently i am using a very naive method of iterating through them in a double for loop.
目前,我正在使用一种非常幼稚的方法在double for循环中迭代它们。
Like 喜欢
For i in file 1:
line number = 0
For j in file 2:
loop until line number == counter else add 1 to line number
Compare line 1
increase counter
Reading both files into memory is not an option. 不能将两个文件都读到内存中。 I am using python on linux but i am open to both bash solutions and python script solutions
我在Linux上使用python但我对bash解决方案和python脚本解决方案都持开放态度
What about something like this: 像这样的事情呢:
diff <(cut -c 1 file1.txt) <(cut -c 1 file2.txt)
diff
is the tool you use to compare files' lines. diff
是用于比较文件行的工具。 You can use process substitution (anonymous pipe) to compare a version of each file only containing the first character (using cut
). 您可以使用进程替换 (匿名管道)来比较每个仅包含第一个字符的文件的版本(使用
cut
)。
You could zip the two files and iterate them together. 您可以压缩两个文件并将它们迭代在一起。
f1 = open('File 1')
f2 = open('File 2')
flag = True
for file1_line, file2_line in zip(f1, f2):
if file1_line[0] != file2_line[0]:
flag = False
break
print(flag)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.