简体繁体 English

用Python方式将大量行写入文件

[英]Pythonic way to write a large number of lines to a file

原文 2015-10-28 03:24:22 5 2 python/ file-writing

I need to auto-generate a somewhat large Makefile using a Python script. 我需要使用Python脚本自动生成稍大的Makefile。 The number of lines is expected to be relatively large. 预期线的数量相对较大。 The routine for writing to the file is composed of nested loops, whole bunch of conditions etc. 写入文件的例程由嵌套循环，一堆条件等组成。

My options: 我的选择：

Start with an empty string and keep appending the lines to it and finally write the huge string to the file using file.write (pro: only single write operation, con: huge string will take up memory) 从一个空字符串开始，继续向其添加行，最后使用file.write将大字符串写入文件（pro：仅单个写入操作，con：大字符串将占用内存）
Start with an empty list and keep appending the lines to it and finally use file.writelines (pro: single write operation (?), con: huge list takes up memory) 从一个空列表开始，继续向其添加行，最后使用file.writelines（pro：单次写入操作（？），con：大列表占用内存）
Write each line to the file as it is constructed (pro: no large memory consumed, con: huge number of write operations) 将每行写入文件时就构造好了（pro：不消耗大量内存，con：大量写入操作）

What is the idiomatic/recommended way of writing large number of lines to a file? 将大量行写入文件的惯用/推荐方式是什么？

2 个解决方案

Option 3: Write the lines as you generate them. 选项3：在生成线时编写它们。

Writes are already buffered; 写入已被缓冲； you don't have to do it manually as in options 1 and 2. 您不必像选项1和2那样手动进行操作。

Option #3 is usually the best; 选项3通常是最好的。 normal file objects are buffered, so you won't be performing excessive system calls by writing as you receive data to write. 普通文件对象被缓冲，因此在接收要写入的数据时，您不会通过写入来执行过多的系统调用。

Alternatively, you can mix option #2 and #3; 或者，您可以将选项＃2和＃3混合使用。 don't build intermediate lists and call .writelines on them, make the code that would produce said lists a generator function (having it yield values as it goes) or generator expression, and pass that to .writelines . 不要建立中间列表并在它们上调用.writelines ，不要将生成所述列表的代码生成一个生成器函数（使其随其yield值）或生成器表达式，然后将其传递给.writelines 。 It's functionally equivalent to #3 in most cases, but it pushes the work of iterating the generator to the C layer, removing a lot of Python byte code processing overhead. 在大多数情况下，它在功能上等同于＃3，但它将迭代生成器的工作推到了C层，从而消除了许多Python字节代码处理开销。 It's usually meaningless in the context of file I/O costs though. 但是，对于文件I / O成本，这通常是没有意义的。