简体   繁体   中英

How to speed up printing line by line to a file in Python?

Say I'm printing numbers from two arrays into a file:

from numpy import random
number_of_points = 10000
a = random.rand(number_of_points)
b = random.rand(number_of_points)
fh = open('file.txt', 'w')
for i in range(number_of_points):
    for j in range(number_of_points):
        print('%f %f' % (a[i], b[j]), file=fh)

I feel this is making lots of calls to the system to print, whereas sending one call containing this information would be faster. Is this correct? If so, how could I do this? Are there faster ways to implement this?

print has a lot of bells and whistles you're not using, and you're using C-style looping with indexing instead of direct iteration, both of which add needless overhead. You might be able to speed it up a bit by limiting the Python level work, pushing it to the C layer.

For example, in this case, you could replace the whole doubly-nested loop structure with:

import itertools

# You could use '%f %f\n'.__mod__ as the map function if you like, I just
# find the modern format strings a little nicer
fh.writelines(itertools.starmap('{} {}\n'.format, itertools.product(a, b)))

which uses product to produce the results of your nested loops and indexing directly, starmap + str.format to create the lines, and fh.writelines to exhaust the generator created by starmap , writing all of its outputs directly to the file with a single function call, instead of 100,000,000 calls to to print .

Aside from the fixed (unrelated to number of items iterated) setup cost to create the generators and pass the final generator to fh.writelines , the actual iteration, formatting and I/O work will take place entirely at the C layer on the CPython reference interpreter, so it should run quite fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM