简体   繁体   中英

Writing a big file in python faster in memory efficient way

I am trying to create a big file with the same text but my system hangs after executing the script after sometime.

the_text = "This is the text I want to copy 100's of time"
count = 0
while True:
    the_text += the_text
    count += 1
    if count > (int)1e10:
        break

NOTE: Above is an oversimplified version of my code. I want to create a file containing the same text many times and the size of the file is around 27GB .
I know it's because RAM is being overloaded. And that's what I want to know how can I do this in fast and effective way in python.

Don't accumulate the string in memory, instead write them directly to file:

the_text = "This is the text I want to copy 100's of time"
with open( "largefile.txt","wt" ) as output_file
for n in range(10000000):
    output_file.write(the_text)

This took ~14s on my laptop using SSD to create a file of ~440MiB.

Above code is writing one string at a time - I'm sure it could be speeded up by batching the lines together, but doesn't seem much point speculating on that without any info about what your application can do.

Ultimately this will be limited by the disk speed; if your disk can manage 50MiB/s sustained writes then writing 450MiB will take about 9s - this sounds like what my laptop is doing with the line-by-line writes

If I write 100 strings write(the_text*100) at once for /100 times, ie range(100000), this takes ~6s, speedup of 2.5x, writing at ~70MiB/s

If I write 1000 strings at once using range(10000) this takes ~4s - my laptop is starting to top out at ~100MiB/s.

I get ~125MiB/s with write(the_text*100000) .

Increasing further to write(the_text*1000000) slows things down, presumably Python memory handling for the string starts to take appreciable time.

Doing text i/o will be slowing things down a bit - I know with Python I can do about 300MiB/s combined read+write of binary files.

SUMMARY: for a 27GiB file, my laptop running Python 3.9.5 on Windows 10 maxes out at about 125MiB/s or 8s/GiB, so would take ~202s to create the file, when writing strings in chunks of about 4.5MiB (45 chars*100,000). YMMV

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM