简体   繁体   中英

Reading UTF-8 file with universal newlines in Python 2

I previously used os.open() to read and write text files. Now I changed to use codecs.open() because I wanted the UTF-8 support. This works well, but this method returned a different result in Windows, because the source files uses \\r\\n line breaks. It seems to me, that with codecs.open() universal line breaks are not available because it uses binary mode .

My understanding of the problem is that os.open() and codecs.open() have each mutually exclusive features. os.open() in text mode has the nice feature of universal newline mode (which in the case of reading means that it substitutes any form of line break into \\n ), whereas codecs.open() provides the UTF-8 support.

My goal is to read (and preferrably write) UTF-8 encoded files into a unicode string with universal line breaks. This implies that if I read two files with different line breaks the resulting strings should be identical. I want to do this using only core libraries with Python 2.6 compatibility. How do I do this in the most elegant way?

io.open() is the intersection of os.open() and codecs.open() .

It provides full universal newline support and a TextWrapper mode for transparent string decode/encode. I believe it's most similar to the Python 3 implementation of open()

The usage is the same as codecs.open() :

my_file = io.open("myfile.txt", "w", encoding="utf-8")

Text mode and universal newlines are the default options.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM