简体   繁体   中英

Writing to a .txt file (UTF-8), python

I want to save the output ( contents ) to a file (saving it in UTF-8). The file shouldn't be overwritten, it should be saved as a new file - eg file2.txt So, I fists open a file.txt , encode it in UTF-8, do some stuff and then wanna save it to file2.txt in UTF-8. How do I do this?

import codecs
def openfile(filename):
    with codecs.open(filename, encoding="UTF-8") as F:
        contents = F.read()
        ...

The short way:

file('file2.txt','w').write( file('file.txt').read().encode('utf-8') )

The long way:

data = file('file.txt').read()
... process data ...
data = data.encode('utf-8')
file('file2.txt','w').write( data )

And using 'codecs' explicitly:

codecs.getwriter('utf-8')(file('/tmp/bla3','w')).write(data)

I like to separate concerns in situations like this - I think it really makes the code cleaner, easier to maintain, and can be more efficient.

Here you've 3 concerns: reading a UTF-8 file, processing the lines, and writing a UTF-8 file. Assuming your processing is line-based, this works perfectly in Python, since opening and iterating over lines of a file is built in to the language. As well as being clearer, this is more efficient too since it allows you process huge files that don't fit into memory. Finally, it gives you a great way to test your code - because processing is separated from file io it lets you write unit tests, or even just run the processing code on example text and manually review the output without fiddling around with files.

I'm converting the lines to upper case for the purposes of example - presumably your processing will be more interesting. I like using yield here - it makes it easy for the processing to remove or insert extra lines although that's not being used in my trivial example.

def process(lines):
    for line in lines:
        yield line.upper()

with codecs.open(file1, 'r', 'utf-8') as infile:
    with codecs.open(file2, 'w', 'utf-8') as outfile:
        for line in process(infile):
            outfile.write(line)

Open a second file. Use contextlib.nested() if need be. Use shutil.copyfileobj() to copy the contents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM