[英]Python, after opening large number of files, how can i call them to write without any pregiven order?
My problem is i have to read a big text file (several GBs at least) and then while reading, according to a pattern i will write some portion of it to one of many output text files (about 5000). 我的问题是我必须读取一个大文本文件(至少几个GB),然后在读取时,根据一种模式,我会将其一部分写入许多输出文本文件之一(约5000个)。 If this or that pattern is present, i need to write on this or that file. 如果存在该模式,则需要在该文件上写。
So I can create all 5000 text files beforehand, but I don't know how to access that specific text file later to write. 因此,我可以事先创建所有5000个文本文件,但是我不知道以后如何访问该特定文本文件。 Effiency is also a big problem, but i am not even there. 效率也是一个大问题,但我什至没有。
to make it more clear: there are 5000 patterns but total numbe rof them are hundreds of millions, maybe more. 更明确地说:有5000种模式,但总数为数亿个,也许更多。 So whenever i stumble upon a specific pattern, i will write it to its text file. 因此,每当我偶然发现一个特定的模式时,我都会将其写入其文本文件。 However there is no order, so i may need to call same outputfile 1million lines later for example or just after 3 lines, whenever i see it 但是没有顺序,因此我可能需要在以后每100万行调用同一输出文件,例如,每当我看到它时,就在3行之后
Thanks in advance (note: i am also a beginner in python language and i am using 3.6) 在此先感谢(注意:我也是python语言的初学者,我正在使用3.6)
The built-in for opening files in python is open()
. 用于在python中open()
文件的内置函数是open()
。
In Your case I would probably use it with mode = r
for the big file and mode = a
for all the other files. 在您的情况下,我可能将其与mode = r
用于大文件,将mode = a
用于所有其他文件。 Python will create a file if it is not already there, so no need to create them beforehand. 如果尚不存在,Python将会创建一个文件,因此无需事先创建它们。
While reading the big file you can just specify the the path to the file you want to write to as a string, so you can use string formatting on it. 在读取大文件时,您可以仅将要写入的文件的路径指定为字符串,因此可以在其上使用字符串格式。
with open(r"/BigFile.txt",mode=r) as InputFile:
for row in InputFile:
id = #what you want to have to determine which file to write to
file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)
with open(file_to_write_to,mode="a") as OutputFile:
OutputFile.write(row + "\n")
(The advantage of the with open()
syntax is that you do not have to call the .close()
function on the file Object) ( with open()
语法的优点是您不必在文件Object上调用.close()
函数)
This code has the disadvantage that there is one file open and close operation per input block. 该代码的缺点是每个输入块只有一个文件打开和关闭操作。 You might want to consider building a list of several output operations before exporting them as a batch, but that will only give a time advantage if there are multiple output operations on the same file. 您可能需要考虑在将多个输出操作批量导出之前构建一个包含多个输出操作的列表,但这仅在同一文件上有多个输出操作时才具有时间优势。
BATCH_SIZE = 500
batch_dict = {}
with open(r"/BigFile.txt",mode=r) as InputFile:
for index,row in enumerate(InputFile):
id = #what you want to have to determine which file to write to
if batch_dict.setdefault(str(id),row) is not None:
batch_dict[str(id)] = batch_dict[str(id)] + row +"\n"
if index % BATCH_SIZE = 0:
for batch_id,batch in batch_dict:
file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)
with open(file_to_write_to,mode="a") as OutputFile:
OutputFile.write(batch + "\n")
batch_dict = {}
(Code is untested as I dont have python 3 right now) (代码未经测试,因为我现在没有python 3)
You should open the file only when needed in appending mode, write your data and then close it. 您仅应在附加模式下需要时才打开文件,写入数据,然后将其关闭。
with open('my-file-name','a+') as ff:
ff.write('my-text'+'\n')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.