简体   繁体   English

Python,在打开大量文件后,我如何称呼它们以没有任何预先指定的顺序进行写入?

[英]Python, after opening large number of files, how can i call them to write without any pregiven order?

My problem is i have to read a big text file (several GBs at least) and then while reading, according to a pattern i will write some portion of it to one of many output text files (about 5000). 我的问题是我必须读取一个大文本文件(至少几个GB),然后在读取时,根据一种模式,我会将其一部分写入许多输出文本文件之一(约5000个)。 If this or that pattern is present, i need to write on this or that file. 如果存在该模式,则需要在该文件上写。

So I can create all 5000 text files beforehand, but I don't know how to access that specific text file later to write. 因此,我可以事先创建所有5000个文本文件,但是我不知道以后如何访问该特定文本文件。 Effiency is also a big problem, but i am not even there. 效率也是一个大问题,但我什至没有。

to make it more clear: there are 5000 patterns but total numbe rof them are hundreds of millions, maybe more. 更明确地说:有5000种模式,但总数为数亿个,也许更多。 So whenever i stumble upon a specific pattern, i will write it to its text file. 因此,每当我偶然发现一个特定的模式时,我都会将其写入其文本文件。 However there is no order, so i may need to call same outputfile 1million lines later for example or just after 3 lines, whenever i see it 但是没有顺序,因此我可能需要在以后每100万行调用同一输出文件,例如,每当我看到它时,就在3行之后

Thanks in advance (note: i am also a beginner in python language and i am using 3.6) 在此先感谢(注意:我也是python语言的初学者,我正在使用3.6)

The built-in for opening files in python is open() . 用于在python中open()文件的内置函数是open()

In Your case I would probably use it with mode = r for the big file and mode = a for all the other files. 在您的情况下,我可能将其与mode = r用于大文件,将mode = a用于所有其他文件。 Python will create a file if it is not already there, so no need to create them beforehand. 如果尚不存在,Python将会创建一个文件,因此无需事先创建它们。

While reading the big file you can just specify the the path to the file you want to write to as a string, so you can use string formatting on it. 在读取大文件时,您可以仅将要写入的文件的路径指定为字符串,因此可以在其上使用字符串格式。

with open(r"/BigFile.txt",mode=r) as InputFile:
    for row in InputFile:

        id = #what you want to have to determine which file to write to

        file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)

        with open(file_to_write_to,mode="a") as OutputFile:
            OutputFile.write(row + "\n")

(The advantage of the with open() syntax is that you do not have to call the .close() function on the file Object) with open()语法的优点是您不必在文件Object上调用.close()函数)

This code has the disadvantage that there is one file open and close operation per input block. 该代码的缺点是每个输入块只有一个文件打开和关闭操作。 You might want to consider building a list of several output operations before exporting them as a batch, but that will only give a time advantage if there are multiple output operations on the same file. 您可能需要考虑在将多个输出操作批量导出之前构建一个包含多个输出操作的列表,但这仅在同一文件上有多个输出操作时才具有时间优势。

BATCH_SIZE = 500
batch_dict = {}

with open(r"/BigFile.txt",mode=r) as InputFile:
    for index,row in enumerate(InputFile):

        id = #what you want to have to determine which file to write to
        if batch_dict.setdefault(str(id),row) is not None:
            batch_dict[str(id)] = batch_dict[str(id)] + row +"\n"

        if index % BATCH_SIZE = 0:
            for batch_id,batch in batch_dict:

                file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)
                with open(file_to_write_to,mode="a") as OutputFile:
                    OutputFile.write(batch + "\n")

            batch_dict = {}

(Code is untested as I dont have python 3 right now) (代码未经测试,因为我现在没有python 3)

You should open the file only when needed in appending mode, write your data and then close it. 您仅应在附加模式下需要时才打开文件,写入数据,然后将其关闭。

with open('my-file-name','a+') as ff:
    ff.write('my-text'+'\n')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python上打开和关闭大量文件 - Opening and closing a large number of files on python 如何在python中写入大文件而不重载memory? - How to write large files in python without overloading memory? 如何编写可读取doc / docx文件并将其转换为txt的python脚本? - How do I write a python script that can read doc/docx files and convert them to txt? 如何复制不同 JSON 文件的正文并将它们全部写入一个带有 Python 的文件? - How can I copy the body of different JSON files and write them all to one file with Python? 在Python中,如何编写可以访问私有属性而不暴露它们的单元测试? - In Python, how do I write unit tests that can access private attributes without exposing them? Python UnitTest-如何访问subTests消息而不必手动编写它们? - Python UnitTest - How can I access subTests messages without having to write them manually? 如何使python的argparse接受任意数量的[-R ab],并将其聚合到列表中 - How can I make python's argparse accept any number of [-R a b]s, and aggregate them into a list 给定一个数据框,如何检查列的值按递增顺序排列而没有任何丢失的数字? - How can I check, given a data frame that the values of a column are in increasing order without any missing number? Python:读/写大量小文件 - Python: Read/write large number of small files 如何编写多个文件,每个文件中的文本都更改 - how can I write multiple files with text changing in each of them
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM