简体   繁体   中英

Read a large text file and write to another file with Python

I am trying to convert a large text file (size of 5 gig+) but got a From this post , I managed to convert encoding format of a text file into a format that is readable with this:

path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
    with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
            t = open('{}/{}'.format(des_path, filename), 'w')
            string = f.read()
            t.write(string)
            t.close()

The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error

Traceback (most recent call last):
  File "Desktop/convertfile.py", line 12, in <module>
    string = f.read()
  File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError

which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.

So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?

You can iterate on the lines of an open file.

for filename in os.listdir(path):
    inp, out = open_files(filename):
    for line in inp: 
        out.write(line)
    inp.close(), out.close()

Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...

Re buffering, ie reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM