Writing to a .txt file (UTF-8), python

Question

I want to save the output ( contents ) to a file (saving it in UTF-8). The file shouldn't be overwritten, it should be saved as a new file - eg file2.txt So, I fists open a file.txt , encode it in UTF-8, do some stuff and then wanna save it to file2.txt in UTF-8. How do I do this?

import codecs
def openfile(filename):
    with codecs.open(filename, encoding="UTF-8") as F:
        contents = F.read()
        ...

Answer 1

The short way:

file('file2.txt','w').write( file('file.txt').read().encode('utf-8') )

The long way:

data = file('file.txt').read()
... process data ...
data = data.encode('utf-8')
file('file2.txt','w').write( data )

And using 'codecs' explicitly:

codecs.getwriter('utf-8')(file('/tmp/bla3','w')).write(data)

Answer 2

I like to separate concerns in situations like this - I think it really makes the code cleaner, easier to maintain, and can be more efficient.

Here you've 3 concerns: reading a UTF-8 file, processing the lines, and writing a UTF-8 file. Assuming your processing is line-based, this works perfectly in Python, since opening and iterating over lines of a file is built in to the language. As well as being clearer, this is more efficient too since it allows you process huge files that don't fit into memory. Finally, it gives you a great way to test your code - because processing is separated from file io it lets you write unit tests, or even just run the processing code on example text and manually review the output without fiddling around with files.

I'm converting the lines to upper case for the purposes of example - presumably your processing will be more interesting. I like using yield here - it makes it easy for the processing to remove or insert extra lines although that's not being used in my trivial example.

def process(lines):
    for line in lines:
        yield line.upper()

with codecs.open(file1, 'r', 'utf-8') as infile:
    with codecs.open(file2, 'w', 'utf-8') as outfile:
        for line in process(infile):
            outfile.write(line)

Answer 3

Open a second file. Use contextlib.nested() if need be. Use shutil.copyfileobj() to copy the contents.

Writing to a .txt file (UTF-8), python

Question

3 answers

solution1
16 ACCPTED 2010-11-06 11:26:33

solution2
9

solution3
2 2010-11-06 11:27:57

Writing to a .txt file (UTF-8), python

Question

3 answers

solution1 16 ACCPTED 2010-11-06 11:26:33

solution2 9

solution3 2 2010-11-06 11:27:57

solution1
16 ACCPTED 2010-11-06 11:26:33

solution2
9

solution3
2 2010-11-06 11:27:57