[英]Python - reading and deleting the top line of a file without loading it into memory
I need to mergeSort text files which are about 150 MB each, and together will amount to about 5GB 我需要合并每个约150 MB的文本文件排序,总计约5GB
The problem is that i can't use mergesort using readlines(), since the last step would need to load 5GB into the memory, and with only the 问题是我无法通过readlines()使用mergesort,因为最后一步需要将5GB加载到内存中,并且仅使用
for line1 in file1, line2 in file2:
while( line1 & line2 )...
command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort 命令,我不能告诉python仅获取文件1的下一行,并保留文件2的行,因此无法进行合并排序
i read something about setting the readbuffer really low on readlines(), only loading a single line into the memory, but then i can't delete the first line from the file 我读到一些关于将readbuffer设置为在readlines()上非常低的信息,仅将一行加载到内存中,但是后来我无法从文件中删除第一行
is there any other memory efficient way to get only the first line of a file and deleting it, or is there an available function to mergesort two text files somewhere allready? 有没有其他有效的内存有效方法来仅获取文件的第一行并将其删除,或者是否有可用的功能将两个文本文件合并排序?
command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort 命令,我不能告诉python仅获取文件1的下一行,并保留文件2的行,因此无法进行合并排序
No you can. 不行
line1 = file1.readline()
line2 = file2.readline()
while file1_not_at_end and file2_not_at_end:
if line1 < line2:
file3.write(line1)
line1 = file1.readline()
else:
file3.write(line2)
line2 = file2.readline()
# merge file 1 into file 3
# merge file 2 into file 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.