简体   繁体   English

Python-读取和删除文件的顶行而不将其加载到内存中

[英]Python - reading and deleting the top line of a file without loading it into memory

I need to mergeSort text files which are about 150 MB each, and together will amount to about 5GB 我需要合并每个约150 MB的文本文件排序,总计约5GB

The problem is that i can't use mergesort using readlines(), since the last step would need to load 5GB into the memory, and with only the 问题是我无法通过readlines()使用mergesort,因为最后一步需要将5GB加载到内存中,并且仅使用

for line1 in file1, line2 in file2:
    while( line1 & line2 )...

command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort 命令,我不能告诉python仅获取文件1的下一行,并保留文件2的行,因此无法进行合并排序

i read something about setting the readbuffer really low on readlines(), only loading a single line into the memory, but then i can't delete the first line from the file 我读到一些关于将readbuffer设置为在readlines()上非常低的信息,仅将一行加载到内存中,但是后来我无法从文件中删除第一行

is there any other memory efficient way to get only the first line of a file and deleting it, or is there an available function to mergesort two text files somewhere allready? 有没有其他有效的内存有效方法来仅获取文件的第一行并将其删除,或者是否有可用的功能将两个文本文件合并排序?

command, i can't tell python to only get the next line of file 1, and keep the line of file 2, and thus are unable to make a merge sort 命令,我不能告诉python仅获取文件1的下一行,并保留文件2的行,因此无法进行合并排序

No you can. 不行

line1 = file1.readline()
line2 = file2.readline()
while file1_not_at_end and file2_not_at_end:
    if line1 < line2:
        file3.write(line1)
        line1 = file1.readline()
    else:
        file3.write(line2)
        line2 = file2.readline()

 # merge file 1 into file 3
 # merge file 2 into file 3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 python 中的文件读取某些行而不将其加载到 memory - Reading certain lines from a file in python without loading it in memory 如何从python中的gzip压缩文件中获取随机行而不将其读入内存 - How to get a random line from within a gzip compressed file in python without reading it into memory 逐行读取 XML 而无需将整个文件加载到 memory - Read XML line by line without loading whole file to memory 在Python中将文本文件读取到二维列表中,而不删除空格 - Reading a text file into a 2d list in Python, without deleting spaces Python逐行读取整个文件 - 内存统计信息 - Python reading whole file vs line by line - memory statistics 在python中加载一个txt文件的第n行而不加载整个文件 - Loading the nth line of a txt file in python without loading the whole file 如何在 python 中打开一个 csv 文件,一次读取一行,而不将整个 csv 文件加载到内存中? - How can I open a csv file in python, and read one line at a time, without loading the whole csv file in memory? 使用 Python 在 json 文件中读取很长的行时出现内存错误 - Memory error while reading a very long line in a json file with Python 重新读取python中的csv文件,而无需再次加载 - re reading a csv file in python without loading it again Python FTP“chunk”迭代器(不将整个文件加载到内存中) - Python FTP “chunk” iterator (without loading entire file into memory)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM